R | Creating a Data Partition
When building a predictive model, it is important to be able to test the model on data that wasn't used to train the model; thus, the available data is typically partitioned out into "training" and "testing" datasets. In R, it's straightforward to this by randomly assigning indices to one set or the other, but the caret
package makes this even easier by automating the index assignment process:
# set.seed(123) # enable if you want to reproduce a specific partition in the next line
indices <- caret::createDataPartition(iris$Sepal.Length, p = 0.75, list = FALSE) # this is random
training_set <- iris[indices,]
testing_set <- iris[-indices,]