UserGuide BioStat Prime Help

Missing Values

Character/Factor

Replace missing values in the variables selected by the specified value. When using the dialog, user doesn't have to enclose the string in double quotes

Character/Factor

Fill Values Downward or Upward

This dialog fills in missing values in dataset columns by using the previous entry in each column. This can be useful in cases where values are not repeated, but recorded each time they change. Typically, this means the dataset is sorted in a meaningful way. The variables where values are filled in will be overwritten.

The arguments used is executing the dialog are given as follows.

Variables to Fill In Values

Specify variables for which missing values will be filled in

Variables to Group By

Specify variables that group rows together. Missing values will be filled in within groups defined by these variables. For example, grouping by a subject identifier would fill in values within subjects.

Direction

Specify the direction for which the values will be filled in.

Fill Values Downward or Upward

Model Imputation

BioStat Prime first constructs a model using the variable to impute values for as the dependent variable. It then uses the constructed model to predict values and replace missing values in the dependent variable by the predicted values.

Model Imputation

The simputation package offers a number of commonly used single imputation methods, each with a similar simple interface. The following imputation methodology is supported.

Imputation methodology
  1. linear regression

  2. robust linear regression

  3. ridge/elasticnet/lasso regression

  4. CART models (decision trees)

  5. Random forest

  6. Multivariate imputation

  7. Imputation based on the expectation-maximization algorithm

  8. missForest (iterative random forest imputation)

  9. Donor imputation (including various donor pool specifications)

  10. k-nearest neigbour (based on gower’s distance)

  11. sequential hotdeck (LOCF, NOCB)

  12. random hotdeck

  13. Predictive mean matching

  14. Model based (optionally add [non]parametric random residual)

  15. Other (groupwise) median imputation (optional random residual)

Numeric

Replace missing values in variables selected by the operation selected i.e. median, mean, min, max

Numeric
Arguments
  1. var Character string representing the numeric variable with missing values (na), for e.g. var = c('sales')

  2. Dataset The dataset that contains the variable var

Remove NAs

Remove missing values/NA from dataset/dataframe Creates new/Overwrites existing dataset by removing rows with one or more missing values for the columns/variable names selected

Remove NAs
Arguments

​object: an R object.​

Impute Missing Values using a formula

Construct a formula to replace missing values. For example user builds a regression model to develop estimates for the missing values, once the equation is generated, user can plug the equation into the dialog and only the missing values in the variable selected will be computed.

Impute Missing Values using a formula
Arguments
  1. var The name of the variable in dataset where missing values are to be replaced for e.g. var=c("sales"). The variable must be of class numeric

  2. Dataset The dataset/dataframe that contains the variable var

  3. Expression The expression used to replace the missing value, in the example above its var2*4+ 1.32

Last modified: 31 July 2025