2017

An R + GitHub Journey

Join us for a GitHub journey, guided by Lucy D’Agostino McGowan! We’ll answer questions like:

What is so great about GitHub?
How can I make it work for me and my workflow?
How can I show the world some of the cool things I’m working on?

This will be a hands-on workshop that will give you all the tools to have a delightful time incorporating version control & R (and blogdown ( https://github.com/rstudio/blogdown) if you are so inclined). All levels are welcome!

Streamline Your Workflow: Integrating SAS, LaTeX, and R into a Single Reproducible Document

There is an industry-wide push toward making workflows seamless and reproducible. Incorporating reproducibility into the workflow has many benefits; among them are increased transparency, time savings, and accuracy. We walk through how to seamlessly integrate SAS®, LaTeX, and R into a single reproducible document. We also discuss best practices for general principles such as literate programming and version control.

Simplifying and Contextualizing Sensitivity to Unmeasured Confounding Tipping Point Analyses

The strength of evidence provided by epidemiological and observational studies is inherently limited by the potential for unmeasured confounding. Thus, we would expect every observational study to include a quantitative sensitivity to unmeasured confounding analysis. However, we reviewed 90 recent studies with statistically significant findings, published in top tier journals, and found 41 mentioned the issue of unmeasured confounding as a limitation, but only 4 included a quantitative sensitivity analysis. Moreover, the rule of thumb that considers effects 2 or greater as robust can be misleading in being too low for studies missing an important confounder and too high for studies that extensively control for confounding. We simplify the seminal work of Rosenbaum and Rubin (1983) and Lin, Pstay, and Kronmal (1998). We focus on three key quantities: the observed bound of the confidence interval closest to the null, a plausible residual effect size for an unmeasured binary confounder, and a realistic prevalence difference for this hypothetical confounder. We offer guidelines to researchers for anchoring the tipping point analysis in the context of the study and provide examples.

2016

Assessing the Association Between Accident Injury Severity and NCAP Car Safety Ratings”

The U.S. New Car Assessment Program (NCAP) evaluates the safety of new cars through their 5-Star Safety Ratings program. In 2010, this program enhanced their protocol, making the ratings more stringent for cars in model years 2011 and onwards. We are interested in assessing this rating system’s ability to predict accident injury severity. To evaluate this question, we use data reported in the National Highway Traffic Safety Administration’s (NHTSA) General Estimates System (GES) database for the years 2011 to 2014, matched to NCAP overall safety ratings for 291 unique make, model, model year combinations. We fit a proportional odds regression model predicting injury severity for 23,641 individual passengers involved in car crashes, adjusting for accident-level covariates, such as the speed of the car and point of impact, and individual-level covariates, such as age and seating position.

Integrating SAS and R to Perform Optimal Propensity Score Matching

In studies where randomization is not possible, imbalance in baseline covariates (confounding by indication) is a fundamental concern. Propensity score matching (PSM) is a popular method to minimize this potential bias, matching individuals who received treatment to those who did not, to reduce the imbalance in pre-treatment covariate distributions. PSM methods continue to advance, as computing resources expand. Optimal matching, which selects the set of matches that minimizes the average difference in propensity scores between mates, has been shown to outperform less computationally intensive methods. However, many find the implementation daunting. SAS/IML® software allows the integration of optimal matching routines that execute in R, e.g. the R optmatch package. This presentation walks through performing optimal PSM in SAS® through implementing R functions, assessing whether covariate trimming is necessary prior to PSM. It covers the propensity score analysis in SAS, the matching procedure, and the post-matching assessment of covariate balance using SAS/STAT® 13.2 and SAS/IML procedures.

2015

Census Tract-Level Disparities: Examining Food Swamps and Food Deserts

Examining disparities in resources on the census tract-level is currently a public health priority. The Modified Retail Food Environment Index (mRFEI), provided by the CDC, incorporates two food environment metrics, ‘food deserts’, areas with no access to healthy foods, and ‘food swamps’, areas in which the quantity of unhealthy food options overwhelm healthy ones. We assess the association between the census tract racial make-up and food environment. Multiple logistic regression models are fit, controlling for census-tract level covariates from 2008-2012 ACS estimates, as well as state. Percent black is significantly associated with food swamps, with an absolute increase of 14.4 percent black living in food swamps (p< 0.01). Percent Hispanic is associated with food swamps, with an absolute increase of 9.1 percent Hispanic living in food swamps (p< 0.01), but inversely related to food deserts (absolute difference -6.8, p< 0.01). After adjustment, all associations remain significant. The strong association between the census tract-level racial make-up and food swamps shown here will allow for targeted interventions to census tracts where these disparities exist.

Using PROC SURVEYREG and PROC SURVEYLOGISTIC to Assess Potential Bias

The Behavioral Risk Factor Surveillance System (BRFSS) collects data on health practices and risk behaviors via telephone survey. This study focuses on the question, On average, how many hours of sleep do you get in a 24-hour period? Recall bias is a potential concern in interviews and questionnaires, such as BRFSS. The 2013 BRFSS data is used to illustrate the proper methods for implementing PROC SURVEYREG and PROC SURVEYLOGISTIC, using the complex weighting scheme that BRFSS provides.

2014

Using SAS/STAT® Software to Validate a Health Literacy Prediction Model in a Primary Care Setting

Existing health literacy assessment tools developed for research purposes have constraints that limit their utility for clinical practice. The measurement of health literacy in clinical practice can be impractical due to the time requirements of existing assessment tools. Single Item Literacy Screener (SILS) items, which are self-administered brief screening questions, have been developed to address this constraint. We developed a model to predict limited health literacy that consists of two SILS and demographic information (for example, age, race, and education status) using a sample of patients in a St. Louis emergency department. In this paper, we validate this prediction model in a separate sample of patients visiting a primary care clinic in St. Louis. Using the prediction model developed in the previous study, we use SAS/STAT® software to validate this model based on three goodness of fit criteria: rescaled R-squared, AIC, and BIC. We compare models using two different measures of health literacy, Newest Vital Sign (NVS) and Rapid Assessment of Health Literacy in Medicine Revised (REALM-R). We evaluate the prediction model by examining the concordance, area under the ROC curve, sensitivity, specificity, kappa, and gamma statistics. Preliminary results show 69% concordance when comparing the model results to the REALM-R and 66% concordance when comparing to the NVS. Our conclusion is that validating a prediction model for inadequate health literacy would provide a feasible way to assess health literacy in fast-paced clinical settings. This would allow us to reach patients with limited health literacy with educational interventions and better meet their information needs.

2013

Developing County-Level Estimates of Racial Disparities in Obesity Using Multilevel Reweighted Regression

Background: The agenda to reduce racial health disparities has been set primarily at the national and state levels. These levels may be too far removed from the individual level where health outcomes are realized. This disconnect may be slowing the progress made in reducing these disparities. We use a small area analysis technique to fill the void for county-level disparities data. Methods:Behavioral Risk Factor Surveillance System data is used to estimate the prevalence of obesity by county among Non-Hispanic Whites and Non-Hispanic Blacks. A modified weighting system was developed based on demographics at the county level. A multilevel reweighted regression model is fit to obtain county-level prevalence estimates by race. To examine whether racial disparities exist at the county level, these rates are compared using risk difference and rate ratio. Results: Gulf County, Florida was ranked as having the largest disparity in absolute terms (risk difference). New York County, New York was ranked as having the largest disparity in relative terms (risk ratio). Based on the average risk difference, the top five states with the largest average disparity were: Oklahoma, Kentucky, Ohio, Washington D.C., and Kansas. The top five states with the largest average relative disparity were: Washington D.C., Massachusetts, Colorado, Kentucky, and New York. Conclusions: Addressing disparities based on factors such as race/ethnicity, geographic location, and socioeconomic status is a current public health priority. This study takes a first step in developing the statistical infrastructure needed to target disparities interventions and resources to the local areas with greatest need.

Small Areal Estimation of Racial Disparities in Diabetes Using Multilevel Reweighted Regression

Introduction: The agenda to reduce racial health disparities has been set primarily at the national and state levels. These levels may be too far removed from the individual level where health outcomes are realized. This disconnect may be slowing the progress made in reducing these disparities. We use a small area analysis technique to fill the void for county level disparities data. Methods: Behavioral Risk Factor Surveillance System data is used to estimate the prevalence of diabetes by county among Non-Hispanic Whites and Non-Hispanic Blacks. A modified weighting system was developed based on demographics at the county-level. A multilevel reweighted regression model is fit to obtain county level prevalence estimates by race. To examine whether racial disparities exist at the county-level, these rates are compared using risk difference and rate ratio. Results: The District of Columbia was ranked as having the largest average disparity in both absolute and relative terms (risk difference and risk ratio). Based on the average risk difference of counties within a state, the next five states with the largest average disparity are: Massachusetts, Kansas, Ohio, North Carolina, and Kentucky. The next five states with the largest average relative disparity, calculated with rate ratio, were: Massachusetts, Colorado, Kansas, Illinois, and Ohio. Discussion: Addressing disparities based on factors such as race/ethnicity, geographic location, and socioeconomic status is a current public health priority. This study takes a first step in developing the statistical infrastructure needed to target disparities interventions and resources to the local areas with greatest need.

Recent & Upcoming Talks

2017

2016

2015

2014

2013