Guru's Verification engine ensures consistency, confidence, and trust in the knowledge your organization shares. Learn more.

R | Creating a Data Partition

When building a predictive model, it is important to be able to test the model on data that wasn't used to train the model; thus, the available data is typically partitioned out into "training" and "testing" datasets. In R, it's straightforward to this by randomly assigning indices to one set or the other, but the caret package makes this even easier by automating the index assignment process:

# set.seed(123) # enable if you want to reproduce a specific partition in the next lineindices <- caret::createDataPartition(iris$Sepal.Length, p = 0.75, list = FALSE) # this is randomtraining_set <- iris[indices,]testing_set <- iris[-indices,]
You must have Author or Collection Owner permission to create Guru Cards. Contact your team's Guru admins to use this template.