UserGuide BioStat Prime Help

Subset by Position

This section of Subset tab, subsets a dataset according to row position.

Specify New Dataset Name: Dataset name where the subsetted data will be stored

Variables to Sort By First: Variables used to sort the rows before any subsetting is undertaken. This only will affect options that select the number of rows, e.g. First/Last N Rows, First/Last Proportion of Rows, and Specify Row Numbers. It will always be in ascending order.

Groups to Subset Within: Specifying no variables will subset according to the row position of the entire dataset. Specifying variables will subset according to the row position within groups defined by all combinations of values for the specified variables.

Subset Type

First N Rows: Keeps the first N rows of the dataset overall or within groups

Last N Rows: Keeps the last N rows of the dataset overall or within groups

Rows with Lowest N Values for a Variable: Keeps the rows that have the lowest ordered values for a specified variable overall or within groups. For example, specifying 10 would keep the rows with the lowest 10 values for a variable.

Rows with Highest N Values for a Variable: Keeps the rows that have the highest ordered values for a specified variable overall or within groups. For example, specifying 10 would keep the rows with the highest 10 values for a variable.

First Proportion of Rows: Keeps the rows in the top proportion of the dataset overall or within groups. For example, specifying .10 would keep the top 10% of the dataset according to the total number of rows.

Last Proportion of Rows: Keeps the rows in the bottom proportion of the dataset overall or within groups. For example, specifying .10 would keep the bottom 10% of the dataset according to the total number of rows.

Rows within Lowest Percentile for a Variable: Keeps the rows in the lowest percentile for a specified variable, overall or within groups. For example, specifying .10 would keep the lowest 10th percentile for a variable (minimum to the 10th percentile).

Rows within Highest Percentile for a Variable: Keeps the rows in the highest percentile for a specified variable, overall or within groups. For example, specifying .10 would keep the highest 10th percentile for a variable (90th percentile to the maximum).

Specify Row Numbers: Keeps the exact numbered rows specified. For example, specifying 1,3,5 would keep the first, third, and fifth rows. Specifying 20:30 would keep rows 20 to 30. Specifying seq(2,10,by=2) would keep the even numbered rows up to the 10th row.

Include Tied Values: Specifies whether tied values should be included or not. For example, if you want the rows for the lowest 10 values of a variable and the 10th lowest value appears more than once, including the tied values will keep all rows that equal the duplicated value.

Subset Type
Last modified: 31 July 2025