Package 'perccalc' reference manual

Title:	Estimate Percentiles from an Ordered Categorical Variable
Description:	An implementation of two functions that estimate values for percentiles from an ordered categorical variable as described by Reardon (2011, isbn:978-0-87154-372-1). One function estimates percentile differences from two percentiles while the other returns the values for every percentile from 1 to 100.
Authors:	Jorge Cimentada [aut, cre]
Maintainer:	Jorge Cimentada <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.5
Built:	2025-02-08 03:52:34 UTC
Source:	https://github.com/cimentadaj/perccalc

Calculate percentile differences from an ordered categorical variable and a continuous variable.

Description

Calculate percentile differences from an ordered categorical variable and a continuous variable.

Usage

perc_diff(
  data_model,
  categorical_var,
  continuous_var,
  weights = NULL,
  percentiles = c(90, 10)
)

perc_diff_df(
  data_model,
  categorical_var,
  continuous_var,
  weights = NULL,
  percentiles = c(90, 10)
)
perc_diff(
  data_model,
  categorical_var,
  continuous_var,
  weights = NULL,
  percentiles = c(90, 10)
)

perc_diff_df(
  data_model,
  categorical_var,
  continuous_var,
  weights = NULL,
  percentiles = c(90, 10)
)

Arguments

`data_model`	A data frame with at least the categorical and continuous variables from which to estimate the percentile differences
`categorical_var`	The bare unquoted name of the categorical variable. This variable SHOULD be an ordered factor. If not, will raise an error.
`continuous_var`	The bare unquoted name of the continuous variable from which to estimate the percentiles
`weights`	The bare unquoted name of the optional weight variable. If not specified, then estimation is done without weights
`percentiles`	A numeric vector of two numbers specifying which percentiles to subtract

Details

perc_diff drops missing observations silently for calculating the linear combination of coefficients.

Value

perc_diff returns a vector with the percentile difference and its associated standard error. perc_diff_df returns the same but as a data frame.

Examples



set.seed(23131)
N <- 1000
K <- 20

toy_data <- data.frame(id = 1:N,
                       score = rnorm(N, sd = 2),
                       type = rep(paste0("inc", 1:20), each = N/K),
                       wt = 1)


# perc_diff(toy_data, type, score)
# type is not an ordered factor!

toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE)

perc_diff(toy_data, type, score, percentiles = c(90, 10))
perc_diff(toy_data, type, score, percentiles = c(50, 10))

perc_diff(toy_data, type, score, weights = wt, percentiles = c(30, 10))
# Results as data frame
perc_diff_df(toy_data, type, score, weights = wt, percentiles = c(30, 10))

set.seed(23131)
N <- 1000
K <- 20

toy_data <- data.frame(id = 1:N,
                       score = rnorm(N, sd = 2),
                       type = rep(paste0("inc", 1:20), each = N/K),
                       wt = 1)


# perc_diff(toy_data, type, score)
# type is not an ordered factor!

toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE)

perc_diff(toy_data, type, score, percentiles = c(90, 10))
perc_diff(toy_data, type, score, percentiles = c(50, 10))

perc_diff(toy_data, type, score, weights = wt, percentiles = c(30, 10))
# Results as data frame
perc_diff_df(toy_data, type, score, weights = wt, percentiles = c(30, 10))

Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.

Description

Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.

Usage

perc_dist(data_model, categorical_var, continuous_var, weights = NULL)
perc_dist(data_model, categorical_var, continuous_var, weights = NULL)

Arguments

`data_model`	A data frame with at least the categorical and continuous variables from which to estimate the percentiles
`categorical_var`	The bare unquoted name of the categorical variable. This variable should be an ordered factor. If not, will raise an error.
`continuous_var`	The bare unquoted name of the continuous variable from which to estimate the percentiles
`weights`	The bare unquoted name of the optional weight variable. If not specified, then equal weights are assumed.

Details

perc_dist drops missing observations silently for calculating the linear combination of coefficients.

Value

A data frame with the scores and standard errors for each percentile

Examples


set.seed(23131)
N <- 1000
K <- 20

toy_data <- data.frame(id = 1:N,
                       score = rnorm(N, sd = 2),
                       type = rep(paste0("inc", 1:20), each = N/K),
                       wt = 1)


# perc_diff(toy_data, type, score)
# type is not an ordered factor!

toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE)

perc_dist(toy_data, type, score)
set.seed(23131)
N <- 1000
K <- 20

toy_data <- data.frame(id = 1:N,
                       score = rnorm(N, sd = 2),
                       type = rep(paste0("inc", 1:20), each = N/K),
                       wt = 1)


# perc_diff(toy_data, type, score)
# type is not an ordered factor!

toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE)

perc_dist(toy_data, type, score)

Mathematics test scores of Spain, Germany and Estonia in the PISA 2006 test

Description

A dataset containing the test scores and other household information of students from Spain, Germany and Estonia from the PISA 2006 test.

Usage

pisa_2006
pisa_2006

Format

A data frame with 25884 rows and 10 variables:

year: Year of the survey
CNT: Long country names
STIDSTD: Unique student id
father_edu: The father's highest achieved degree in the ISCED scale
household_income: The household's total income in categories
avg_math: The average math test score out of the 5 plausible values in Mathematics

Source

A subset extracted from the PISA2006lite R package, https://github.com/pbiecek/PISA2012lite

Mathematics test scores of Spain, Germany and Estonia in the PISA 2012 test

Description

A dataset containing the test scores and other household information of students from Spain, Germany and Estonia from the PISA 2012 test.

Usage

pisa_2012
pisa_2012

Format

A data frame with 35093 rows and 10 variables:

year: Year of the survey
CNT: Long country names
STIDSTD: Unique student id
father_edu: The father's highest achieved degree in the ISCED scale
household_income: The household's total income in categories
avg_math: The average math test score out of the 5 plausible values in Mathematics

Source

A subset extracted from the PISA2012lite R package, https://github.com/pbiecek/PISA2012lite

Package 'perccalc'

Help Index

Calculate percentile differences from an ordered categorical variable and a continuous variable.

Description

Usage

Arguments

Details

Value

Examples

Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.

Description

Usage

Arguments

Details

Value

Examples

Mathematics test scores of Spain, Germany and Estonia in the PISA 2006 test

Description

Usage

Format

Source

Mathematics test scores of Spain, Germany and Estonia in the PISA 2012 test

Description

Usage

Format

Source