Course 5: Applied Multiple Imputation
Dr. Ferdinand Geißler, Dr. Jan Paul Heisig
Location: Online via Zoom
[This is a 30 hour class.]
Missing data are a pervasive problem in the social sciences. Data for a given unit may be missing entirely, for example, because a sampled respondent refused to participate in a survey (survey nonresponse). Alternatively, information may be missing only for a subset of variables (item nonresponse), for example, because a respondent refused to answer some of the questions in a survey. The traditional way of dealing with item nonresponse, referred to as “complete case analysis” (CCA) or “listwise deletion”, excludes every observation with missing information from the analysis. While easy to implement, complete case analysis is wasteful and can lead to biased estimates. Multiple imputation (MI) seeks to address these issues and provides more efficient and unbiased estimates when certain conditions are met. Therefore, it is increasingly replacing CCA as the method of choice for dealing with item nonresponse in applied quantitative work in the social sciences.
The goals of the course are to introduce participants to the basic concepts and statistical foundations of missing data analysis and MI, and to enable them to use MI in their own work. The course puts heavy emphasis on the practical application of MI and on the complex decisions and challenges that researchers are facing in its course. The focus is on MI using iterated chained equations (aka “fully conditional specification”) and its implementation in the software package Stata. Participants should have a good working knowledge of Stata to follow the applied parts of the course and to successfully master the exercises. Participants who are not familiar with Stata may still benefit from the course, but will likely find the exercises quite challenging.
A detailed syllabus for this course is available for download here.
Participants will find the course useful if:
By the end of the course participants will:
This is a five-day course with a total amount of 30 hours of virtual class time. Each day will begin with a three-hour lecture-like segment introducing the new material (9:30am-12:30pm). Exercises, most of them involving hands-on programming, will be distributed at the end of the lecture segment. Participants can start working on the exercises during the extended lunch break (12:30pm to 2:30pm). The first afternoon segment (2:30pm-4:30pm) will focus on the exercises. Participants will continue to work on the exercises, now with assistance from the lecturers, and eventually answers and solutions will be discussed with the full group. The final “flextime” segment of each day (4:30pm to 5:30pm) will serve to further discuss questions that have come up during the day and for lecturer-participant meetings that focus on individual questions and problems. Participants interested in individual consultations concerning their ongoing projects are encouraged to contact the lecturers before the course and provide a short description of the issues they would like to discuss. The individual-meetings can also be used for questions that arise during the course, however.
Software and Hardware Requirements
The practical examples and hands-on exercises will be done in Stata. Participants should have a recent version installed on their local computer. Version 15 or later would be ideal, although most examples should work in versions 12 and later. Participants who do not own a copy of Stata will be provided with access to a full Stata license by GESIS for the duration of the course. Stata will be installed and activated prior to the course by GESIS staff through remote access on the participants' machines.