1. Introduction
Data manipulation is a fundamental step in data analysis. At times, we might have redundant or unnecessary columns in our dataframe that we'd like to remove for clarity. In R, dropping columns from a dataframe can be achieved using a few different techniques. This guide will focus on the use of the select function from the dplyr package.
2. Program Overview
1. Create a sample dataframe.
2. Drop columns using negative selection.
3. Drop columns by name.
3. Code Program
# Load necessary library
library(dplyr)
# Create a sample dataframe
df <- data.frame(
Name = c('John', 'Jane', 'Doe'),
Age = c(25, 28, 22),
Gender = c('Male', 'Female', 'Male'),
Score = c(85, 90, 78)
)
# Display the original dataframe
print("Original Dataframe:")
print(df)
# Drop the 'Gender' and 'Score' columns using negative selection
df1 <- df %>% select(-c(Gender, Score))
# Display the dataframe after dropping columns
print("Dataframe after Dropping 'Gender' and 'Score' Columns:")
print(df1)
# Another method: Drop the 'Age' column by name
df2 <- df[, -which(names(df) %in% c("Age"))]
# Display the dataframe after dropping the 'Age' column
print("Dataframe after Dropping 'Age' Column:")
print(df2)
Output:
[1] "Original Dataframe:" Name Age Gender Score 1 John 25 Male 85 2 Jane 28 Female 90 3 Doe 22 Male 78 [1] "Dataframe after Dropping 'Gender' and 'Score' Columns:" Name Age 1 John 25 2 Jane 28 3 Doe 22 [1] "Dataframe after Dropping 'Age' Column:" Name Gender Score 1 John Male 85 2 Jane Female 90 3 Doe Male 78
4. Step By Step Explanation
- We initiate by creating a sample dataframe df with columns: Name, Age, Gender, and Score.
- To drop columns, we use the select function from the dplyr package. By placing a - in front of the column name(s) we wish to exclude, we're effectively telling R to keep all columns except those specified.
- In another method, if you want to exclude columns without the dplyr package, you can use base R's negative indexing with the help of which and names functions.
Comments
Post a Comment
Leave Comment