Systematic Sampleing in R with Base R

code
rtip
Author

Steven P. Sanderson II, MPH

Published

August 5, 2024

Introduction

In this post, we will explore systematic sampling in R using base R functions. Systematic sampling is a technique where you select every (k^{th}) element from a list or dataset. This method is straightforward and useful when you want a representative sample without the complexity of more advanced sampling techniques.

Let’s dive into an example to understand how it works.

What is Systematic Sampling?

Systematic sampling involves selecting every (k^{th}) element from a dataset after a random start. The value of (k) is calculated as:

\[ k = \frac{N}{n} \]

where (N) is the population size and (n) is the sample size.

Example: Sampling a Dataset

Imagine we have a dataset of 1000 elements, and we want to select a sample of 100 elements using systematic sampling.

  1. Generate a Dataset

First, let’s create a dataset with 1000 elements.

set.seed(123)  # Setting seed for reproducibility, although with this 
               # example it doesn't matter
population <- 1:1000

Here, population is a sequence of numbers from 1 to 1000.

  1. Define Sample Size

Define the number of elements you want to sample.

sample_size <- 100
  1. Calculate Interval (k)

Calculate the interval (k) as the ratio of the population size to the sample size.

k <- length(population) / sample_size
  1. Random Start Point

Choose a random starting point between 1 and (k).

start <- sample(1:k, 1)
  1. Select Every (k^{th}) Element

Use a sequence to select every (k^{th}) element starting from the chosen start point.

systematic_sample <- population[seq(start, length(population), by = k)]
  1. Check the Sample

Print the first few elements of the sample to check.

head(systematic_sample)
[1]  3 13 23 33 43 53

Here is the complete code in one block:

# Step 1: Generate a Dataset
set.seed(123)  # Setting seed for reproducibility
population <- 1:1000

# Step 2: Define Sample Size
sample_size <- 100

# Step 3: Calculate Interval k
k <- length(population) / sample_size

# Step 4: Random Start Point
start <- sample(1:k, 1)

# Step 5: Select Every k-th Element
systematic_sample <- population[seq(start, length(population), by = k)]

# Step 6: Check the Sample
head(systematic_sample)

Try It Yourself!

Systematic sampling is a simple yet powerful technique. By following the steps above, you can apply it to your datasets. Experiment with different sample sizes and starting points to see how the samples vary. This method can be particularly useful when dealing with large datasets where random sampling might be cumbersome.

Give it a go and see how systematic sampling can be a handy tool in your data analysis toolkit!


Happy Coding!