Recent & Upcoming Talks

2020

Tools for Analyzing R code the Tidy Way

With the current emphasis on reproducibility and replicability, there is an increasing need to examine how data analyses are conducted. In order to analyze the between researcher variability in data analysis choices as well as the aspects within the data analysis pipeline that contribute to the variability in results, we have created two R packages: matahari and tidycode. These packages build on methods created for natural language processing; rather than allowing for the processing of natural language, we focus on R code as the substrate of interest. The matahari package facilitates the logging of everything that is typed in the R console or in an R script in a tidy data frame. The tidycode package contains tools to allow for analyzing R calls in a tidy manner. We demonstrate the utility of these packages as well as walk through two examples.

Using RStudio Cloud in the Classroom

This workshop covers set up, implementation, and tips and tricks for integrating RStudio Cloud in your classroom. RStudio Cloud is a great way to incorporate R in the classroom without the hassle of installation and complex set up.

January 29, 2020

4:30 PM – 5:30 PM

ASA K-12 Virtual Workshops 2020


By Lucy D'Agostino McGowan and Shannon Ellis in Invited Workshop

slides

2019

Challenges in Augmenting Randomized Trials with Observational Health Records

This talk addresses challenges with making health record data and clinical trial data compatible. The data collected in trials is collected regularly and in an organized way, while data from health records is messier and more haphazard. A clinical trial has a clear start and endpoint, while health record data is collected continuously. Additionally, clinical trial participants may be healthier than patients we see in health records. Covariates are defined in advance for a trial, but must be predicted or imputed from the health record. In this talk I will discuss some of the challenges we have encountered in trying to integrate trial data with observational health records to improve power and design new trials.

There and Back Again, a Data Scientist’s Tale

We are in an exciting new age with access to an overwhelming amount of data and information. This talk will focus on three areas that have become increasingly important as a result. First, we will discuss the importance of reproducibility during this age of information overload. As quantitatively minded people, we are being pushed to innovate and develop best practices for reproducibility. We will talk a bit about tools that make this possible and the next steps in this important area. We will then discuss new opportunities for developing innovative methods, particularly in the observational research space. This portion will include a brief introduction to causal inference for the data scientist. Finally, we will examine the importance of well-developed communication skills for quantitatively savvy people. These aspects will be discussed in the context of my winding path to data science, speckled with some advice and lessons learned.

April 17, 2019

4:00 PM – 5:00 PM

Macalester College 2019


By Lucy D'Agostino McGowan in Invited Keynote

slides

Data Visualizations with ggplot2

“If you’re navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it’s a relief. It’s like coming across a clearing in the jungle.” – David McCandless.

The ability to create polished, factual, and easily-understood data visualizations is a crucial skill for the modern statistician. Visualizations aid with all steps of the data analysis pipeline, from exploratory data analysis to effectively communicating results to a broad audience. This tutorial will first cover best practices in data visualization. We will then dive into a hands on experience building intuitive and elegant graphics using R with the ggplot2 package, a system for creating visualizations based on The Grammar of Graphics.

March 25, 2019

10:30 AM – 12:15 PM

ENAR 2019


By Lucy D'Agostino McGowan in Invited Workshop

pdf

2018

Exploring Finite-sample Bias in Propensity Score Weights

The principle limitation of all observational studies is the potential for unmeasured confounding. Various study designs may perform similarly in controlling for bias due to measured confounders while differing in their sensitivity to unmeasured confounding. Design sensitivity (Rosenbaum, 2004) quantifies the strength of an unmeasured confounder needed to nullify an observed finding. In this presentation, we explore how robust certain study designs are to various unmeasured confounding scenarios. We focus particularly on two exciting new study designs - ATM and ATO weights. We illustrate the performance in a large electronic health records based study and provide recommendations for sensitivity to unmeasured confounding analyses in ATM and ATO weighted studies, focusing primarily on the potential reduction in finite-sample bias.

Making Causal Claims as a Data Scientist: Tips and Tricks Using R

Making believable causal claims can be difficult, especially with the much repeated adage “correlation is not causation”. This talk will walk through some tools often used to practice safe causation, such as propensity scores and sensitivity analyses. In addition, we will cover principles that suggest causation such as the understanding of counterfactuals, and applying Hill’s criteria in a data science setting. We will walk through specific examples, as well as provide R code for all methods discussed.

2017

R-Ladies Panel: Improving Gender Diversity in a Male-dominated Community

R-Ladies is a worldwide organization whose mission is to promote gender diversity in the R community. We are interested in presenting a panel of regional leaders in the R-Ladies movement. We will discuss topics such as diversity data in the R community, best practices for starting up a meetup in your own community, best practices for running and continued success of a meetup in your community, and funding opportunities. We will also diagnose different obstacles and discuss how we attack them, for example increasing women’s competence versus confidence versus recognition in the R community. Finally we will provide resources and details about how to get involved with local meetups.

October 21, 2017

10:00 AM – 11:30 AM

Women in Statistics and Data Science 2017


By Jenny Bryan, Mine Cetinkaya-Rundel, Lucy D'Agostino McGowan, Gabriela de Queiroz, Mine Dogucu, Katherine Scranton, Jennifer Thompson in Contributed Panel

details

Contextualizing Sensitivity Analysis in Observational Studies: Calculating Bias Factors for Known Covariates

The strength of evidence provided by epidemiological and observational studies is inherently limited by the potential for unmeasured confounding. While methods exist to quantify the potential effect of a specified unmeasured confounder, these methods should be anchored and contextualized within each study. We put forward a method for merging sensitivity to unmeasured confounding analyses with the impacts of the observed covariates. We graphically display what we call the observed bias factors with the tipping point sensitivity analysis. We illustrate the method under various study designs and provide an application created to simplify the implementation of this methodology.

papr: Tinder for pre-prints, a Shiny Application for collecting gut-reactions to pre-prints from the scientific community

papr is an R Shiny web application and social network for evaluating bioRxiv pre-prints. The app serves multiple purposes, allowing the user to quickly swipe through pertinent abstracts as well as find a community of researchers with similar interests. It also serves as a portal for accessible “open science”, getting abstracts into the hands of users of all skill levels. Additionally, the data could help build a general understanding of what research the community finds exciting.

We allow the user to log in via Google to track multiple sessions and have implemented a recommender engine, allowing us to tailor which abstracts are shown based on each user’s previous abstract rankings. While using the app, users view an abstract pulled from bioRxiv and rate it as “exciting and correct”, “exciting and questionable”, “boring and correct”, or “boring and questionable” by swiping the abstract in a given direction. The app includes optional social network features, connecting users who provide their twitter handle to users who enjoy similar papers.

This presentation will demonstrate how to incorporate tactile interfaces, such as swiping, into a Shiny application using a package we created for this functionality shinysense, store real-time user data on Dropbox using drop2, login in capabilities using googleAuthR and googleID, how to implement a recommender engine using principle component analysis, and how we have handled issues of data safety/security through proactive planning and risk mitigation. Finally, we will report the app activity, summarizing both the user traffic and what research users are finding exciting.