Title: | Estimate Percentiles from an Ordered Categorical Variable |
---|---|
Description: | An implementation of two functions that estimate values for percentiles from an ordered categorical variable as described by Reardon (2011, isbn:978-0-87154-372-1). One function estimates percentile differences from two percentiles while the other returns the values for every percentile from 1 to 100. |
Authors: | Jorge Cimentada [aut, cre] |
Maintainer: | Jorge Cimentada <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.5 |
Built: | 2024-11-10 04:15:12 UTC |
Source: | https://github.com/cimentadaj/perccalc |
Calculate percentile differences from an ordered categorical variable and a continuous variable.
perc_diff( data_model, categorical_var, continuous_var, weights = NULL, percentiles = c(90, 10) ) perc_diff_df( data_model, categorical_var, continuous_var, weights = NULL, percentiles = c(90, 10) )
perc_diff( data_model, categorical_var, continuous_var, weights = NULL, percentiles = c(90, 10) ) perc_diff_df( data_model, categorical_var, continuous_var, weights = NULL, percentiles = c(90, 10) )
data_model |
A data frame with at least the categorical and continuous variables from which to estimate the percentile differences |
categorical_var |
The bare unquoted name of the categorical variable. This variable SHOULD be an ordered factor. If not, will raise an error. |
continuous_var |
The bare unquoted name of the continuous variable from which to estimate the percentiles |
weights |
The bare unquoted name of the optional weight variable. If not specified, then estimation is done without weights |
percentiles |
A numeric vector of two numbers specifying which percentiles to subtract |
perc_diff
drops missing observations silently for calculating
the linear combination of coefficients.
perc_diff
returns a vector with the percentile difference and
its associated standard error. perc_diff_df
returns the same but as
a data frame.
set.seed(23131) N <- 1000 K <- 20 toy_data <- data.frame(id = 1:N, score = rnorm(N, sd = 2), type = rep(paste0("inc", 1:20), each = N/K), wt = 1) # perc_diff(toy_data, type, score) # type is not an ordered factor! toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE) perc_diff(toy_data, type, score, percentiles = c(90, 10)) perc_diff(toy_data, type, score, percentiles = c(50, 10)) perc_diff(toy_data, type, score, weights = wt, percentiles = c(30, 10)) # Results as data frame perc_diff_df(toy_data, type, score, weights = wt, percentiles = c(30, 10))
set.seed(23131) N <- 1000 K <- 20 toy_data <- data.frame(id = 1:N, score = rnorm(N, sd = 2), type = rep(paste0("inc", 1:20), each = N/K), wt = 1) # perc_diff(toy_data, type, score) # type is not an ordered factor! toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE) perc_diff(toy_data, type, score, percentiles = c(90, 10)) perc_diff(toy_data, type, score, percentiles = c(50, 10)) perc_diff(toy_data, type, score, weights = wt, percentiles = c(30, 10)) # Results as data frame perc_diff_df(toy_data, type, score, weights = wt, percentiles = c(30, 10))
Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.
perc_dist(data_model, categorical_var, continuous_var, weights = NULL)
perc_dist(data_model, categorical_var, continuous_var, weights = NULL)
data_model |
A data frame with at least the categorical and continuous variables from which to estimate the percentiles |
categorical_var |
The bare unquoted name of the categorical variable. This variable should be an ordered factor. If not, will raise an error. |
continuous_var |
The bare unquoted name of the continuous variable from which to estimate the percentiles |
weights |
The bare unquoted name of the optional weight variable. If not specified, then equal weights are assumed. |
perc_dist
drops missing observations silently for calculating
the linear combination of coefficients.
A data frame with the scores and standard errors for each percentile
set.seed(23131) N <- 1000 K <- 20 toy_data <- data.frame(id = 1:N, score = rnorm(N, sd = 2), type = rep(paste0("inc", 1:20), each = N/K), wt = 1) # perc_diff(toy_data, type, score) # type is not an ordered factor! toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE) perc_dist(toy_data, type, score)
set.seed(23131) N <- 1000 K <- 20 toy_data <- data.frame(id = 1:N, score = rnorm(N, sd = 2), type = rep(paste0("inc", 1:20), each = N/K), wt = 1) # perc_diff(toy_data, type, score) # type is not an ordered factor! toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE) perc_dist(toy_data, type, score)
A dataset containing the test scores and other household information of students from Spain, Germany and Estonia from the PISA 2006 test.
pisa_2006
pisa_2006
A data frame with 25884 rows and 10 variables:
Year of the survey
Long country names
Unique student id
The father's highest achieved degree in the ISCED scale
The household's total income in categories
The average math test score out of the 5 plausible values in Mathematics
A subset extracted from the PISA2006lite
R package, https://github.com/pbiecek/PISA2012lite
A dataset containing the test scores and other household information of students from Spain, Germany and Estonia from the PISA 2012 test.
pisa_2012
pisa_2012
A data frame with 35093 rows and 10 variables:
Year of the survey
Long country names
Unique student id
The father's highest achieved degree in the ISCED scale
The household's total income in categories
The average math test score out of the 5 plausible values in Mathematics
A subset extracted from the PISA2012lite
R package, https://github.com/pbiecek/PISA2012lite