Package 'perccalc'

Title: Estimate Percentiles from an Ordered Categorical Variable
Description: An implementation of two functions that estimate values for percentiles from an ordered categorical variable as described by Reardon (2011, isbn:978-0-87154-372-1). One function estimates percentile differences from two percentiles while the other returns the values for every percentile from 1 to 100.
Authors: Jorge Cimentada [aut, cre]
Maintainer: Jorge Cimentada <[email protected]>
License: MIT + file LICENSE
Version: 1.0.5
Built: 2024-11-10 04:15:12 UTC
Source: https://github.com/cimentadaj/perccalc

Help Index


Calculate percentile differences from an ordered categorical variable and a continuous variable.

Description

Calculate percentile differences from an ordered categorical variable and a continuous variable.

Usage

perc_diff(
  data_model,
  categorical_var,
  continuous_var,
  weights = NULL,
  percentiles = c(90, 10)
)

perc_diff_df(
  data_model,
  categorical_var,
  continuous_var,
  weights = NULL,
  percentiles = c(90, 10)
)

Arguments

data_model

A data frame with at least the categorical and continuous variables from which to estimate the percentile differences

categorical_var

The bare unquoted name of the categorical variable. This variable SHOULD be an ordered factor. If not, will raise an error.

continuous_var

The bare unquoted name of the continuous variable from which to estimate the percentiles

weights

The bare unquoted name of the optional weight variable. If not specified, then estimation is done without weights

percentiles

A numeric vector of two numbers specifying which percentiles to subtract

Details

perc_diff drops missing observations silently for calculating the linear combination of coefficients.

Value

perc_diff returns a vector with the percentile difference and its associated standard error. perc_diff_df returns the same but as a data frame.

Examples

set.seed(23131)
N <- 1000
K <- 20

toy_data <- data.frame(id = 1:N,
                       score = rnorm(N, sd = 2),
                       type = rep(paste0("inc", 1:20), each = N/K),
                       wt = 1)


# perc_diff(toy_data, type, score)
# type is not an ordered factor!

toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE)

perc_diff(toy_data, type, score, percentiles = c(90, 10))
perc_diff(toy_data, type, score, percentiles = c(50, 10))

perc_diff(toy_data, type, score, weights = wt, percentiles = c(30, 10))
# Results as data frame
perc_diff_df(toy_data, type, score, weights = wt, percentiles = c(30, 10))

Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.

Description

Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.

Usage

perc_dist(data_model, categorical_var, continuous_var, weights = NULL)

Arguments

data_model

A data frame with at least the categorical and continuous variables from which to estimate the percentiles

categorical_var

The bare unquoted name of the categorical variable. This variable should be an ordered factor. If not, will raise an error.

continuous_var

The bare unquoted name of the continuous variable from which to estimate the percentiles

weights

The bare unquoted name of the optional weight variable. If not specified, then equal weights are assumed.

Details

perc_dist drops missing observations silently for calculating the linear combination of coefficients.

Value

A data frame with the scores and standard errors for each percentile

Examples

set.seed(23131)
N <- 1000
K <- 20

toy_data <- data.frame(id = 1:N,
                       score = rnorm(N, sd = 2),
                       type = rep(paste0("inc", 1:20), each = N/K),
                       wt = 1)


# perc_diff(toy_data, type, score)
# type is not an ordered factor!

toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE)

perc_dist(toy_data, type, score)

Mathematics test scores of Spain, Germany and Estonia in the PISA 2006 test

Description

A dataset containing the test scores and other household information of students from Spain, Germany and Estonia from the PISA 2006 test.

Usage

pisa_2006

Format

A data frame with 25884 rows and 10 variables:

year

Year of the survey

CNT

Long country names

STIDSTD

Unique student id

father_edu

The father's highest achieved degree in the ISCED scale

household_income

The household's total income in categories

avg_math

The average math test score out of the 5 plausible values in Mathematics

Source

A subset extracted from the PISA2006lite R package, https://github.com/pbiecek/PISA2012lite


Mathematics test scores of Spain, Germany and Estonia in the PISA 2012 test

Description

A dataset containing the test scores and other household information of students from Spain, Germany and Estonia from the PISA 2012 test.

Usage

pisa_2012

Format

A data frame with 35093 rows and 10 variables:

year

Year of the survey

CNT

Long country names

STIDSTD

Unique student id

father_edu

The father's highest achieved degree in the ISCED scale

household_income

The household's total income in categories

avg_math

The average math test score out of the 5 plausible values in Mathematics

Source

A subset extracted from the PISA2012lite R package, https://github.com/pbiecek/PISA2012lite