Warning message with perccalc package

While the other vignette shows you how to use perccalc appropriately, there are instances where there’s just too few categories to estimate percentiles properly. Imagine estimating a distribution of 1:100 percentiles with only three ordered categories, it just sounds too far fetched.

Let’s load our packages.

library(perccalc)
library(dplyr)
library(ggplot2)

For example, take the survey data on smoking habits.

smoking_data <-
  MASS::survey %>% # you will need to install the MASS package
  as_tibble() %>%
  select(Sex, Smoke, Pulse) %>%
  rename(
    gender = Sex,
    smoke = Smoke,
    pulse_rate = Pulse
  )

The final results is this dataset:

## # A tibble: 237 × 3
##    gender smoke pulse_rate
##    <fct>  <fct>      <int>
##  1 Male   Never         35
##  2 Female Never         40
##  3 Female Never         48
##  4 Male   Never         48
##  5 Female Never         50
##  6 Female Regul         50
##  7 Male   Regul         54
##  8 Male   Never         55
##  9 Male   Never         56
## 10 Male   Never         59
## # ℹ 227 more rows

Note that there’s only four categories in the smoke variable. Let’s try to estimate the percentile difference.

smoking_data <-
  smoking_data %>%
  mutate(smoke = factor(smoke,
                        levels = c("Never", "Occas", "Regul", "Heavy"),
                        ordered = TRUE))

perc_diff(smoking_data, smoke, pulse_rate)

## Warning in perc_diff_(data_model = data_model, categorical_var =
## categorical_var, : Too few categories in categorical variable to estimate the
## variance-covariance matrix and standard errors. Proceeding without estimated
## standard errors but perhaps you should increase the number of categories

## difference         se 
##   390.6092         NA

perc_diff returns the estimated coefficient but also warns you that it’s difficult for the function to estimate the standard error. This happens similarly for perc_dist.

perc_dist(smoking_data, smoke, pulse_rate) %>%
  head()

## Warning in perc_dist(smoking_data, smoke, pulse_rate): Too few categories in
## categorical variable to estimate the variance-covariance matrix and standard
## errors. Proceeding without estimated standard errors but perhaps you should
## increase the number of categories

## # A tibble: 6 × 2
##   percentile estimate
##        <int>    <dbl>
## 1          1     24.5
## 2          2     48.4
## 3          3     71.7
## 4          4     94.3
## 5          5    116. 
## 6          6    138.