# Descriptive statistics for X1 # Pre imputation So, how does that affect our data analysis? Let’s do some univariate descriptive statistics: After imputing the mean, however, our density has a weird peak at zero (in our example the mean of X1). Before imputation, X1 is following a normal distribution. Plot(density(data$x1),Ĭ("Before Imputation", "After Imputation"),įigure 1: Density of X1 Pre and Post Mean Imputationįigure 1 displays the density of X1 before (in black) and after (in red) the imputation. # Density of x1 pre and post imputation # Xlab = "X1" ) # Density of observed & imputed dataĬ ( "Before Imputation", "After Imputation" ), Main = "Density Pre and Post Mean Imputation", # Density of x1 pre and post imputation # Density of observed data If we want to impute only one column of our data frame, we can use the following R code: Let’s move on to the part we are interested in: The mean imputation. Table 1: First 6 Rows of Our Example Data for Mean Imputation This is how the first 6 rows of our example data look like: Our data consists of the three variables X1, X2, and X3 – all of them have missing values ( i.e. # Create some synthetic data with missings # na (x3 ) # Store variables in a data frame X3 <- NA # 70% missingness # Indicator for missings (needed later) X3 <- round (runif (N, - 100, 20 ) ) # Insert missing values N <- 1000 # Sample size # Some random variables # Create some synthetic data with missings # set. In the following step-by-step example in R, I’ll show you how mean imputation affects your data in practice.īefore we can start with the example, we need some data with missing values. To make it short, there is basically no excuse for using mean imputation. predictive mean matching or stochastic regression imputation). On top of that, we can also benefit from the advantages with more advanced imputation methods (e.g. In summary: There are a few advantages, but many serious drawbacks. Assume that you want to estimate the mean of a population’s income and people with high income are less likely to respond Your estimate of the mean income would be biased downwards. If the response mechanism is MAR or MNAR, even the sample mean of your variable is biased (compare that with point 3 above).In other words, the confidence interval around the point estimation of our mean would be too narrow. Since all imputed values are exactly the mean of our variable, we would be too sure about the correctness of our mean estimate. For instance, let’s assume that we would like to calculate the standard error of a mean estimation of an imputed variable. Standard errors and variance of imputed variables are biased.Relationships between variables are therefore biased toward zero. Values that are imputed by a variable’s mean have, in general, a correlation of zero with other variables. Mean substitution leads to bias in multivariate estimates such as correlation or regression coefficients.However, let’s move on to the more important part – the drawbacks of mean imputation: We learned some reasons why mean imputation is so popular among data users. Mean substitution might be a valid approach, in case that the univariate average of your variables is the only metric your are interested in. If the response mechanism is MCAR, the sample mean of your variable is not biased.You can explain the imputation method easily to your audience and everybody with basic knowledge in statistics will get what you’ve done. Mean imputation is very simple to understand and to apply (more on that later in the R and SPSS examples).Since mean imputation replaces all missing values, you can keep your whole database.
#Spss code example software#
Missing values in your data do not reduce your sample size, as it would be the case with listwise deletion (the default of many statistical software packages, e.g.However, I’ll be fair and show you also the advantages of the method: You probably already noticed that I’m not a big fan of mean imputation. Sounds easy to apply, doesn’t it? So why is it so evil to use mean substitution? Click on the buttons below to select the topic you are interested in: Mean Imputation Pros & Cons Mean Imputation in R (Example) Mean Imputation in SPSS (Video) Ask me a Question (It's Free)Īdvantages and Drawbacks of Mean Substitution Mean imputation (or mean substitution) replaces missing values of a certain variable by the mean of non-missing cases of that variable.