dplyr filter(): Filter/Select Rows based on conditions - Python and R Tips (2023)

dplyr, R package that is at core of tidyverse suite of packages, provides a great set of tools to manipulate datasets in the tabular form. dplyr has a set of useful functions for “data munging”, including select(), mutate(), summarise(), and arrange() and filter().

And in this tidyverse tutorial, we will learn how to use dplyr’s filter() function to select or filter rows from a data frame with multiple examples. First, we will start with how to select rows of a dataframe based on a value of a single column or variable. And then we will learn how select rows of a dataframe using values from multiple variables or columns.

Let us get started by loading tidyverse, suite of R packges from RStudio.

library("tidyverse")

We will load Penguins data directly from cmdlinetips.com‘s github page.

path2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv"penguins<- readr::read_csv(path2data)

Penguins data look like this

head(penguins)## # A tibble: 6 x 7## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema…## 3 Adelie Torge… 40.3 18 195 3250 fema…## 4 Adelie Torge… NA NA NA NA <NA> ## 5 Adelie Torge… 36.7 19.3 193 3450 fema…## 6 Adelie Torge… 39.3 20.6 190 3650 male

Let us subset Penguins data by filtering rows based on one or more conditions.

How to filter rows based on values of a single column in R?

Let us learn how to filter data frame based on a value of a single column. In this example, we want to subset the data such that we select rows whose “sex” column value is “fename”.

penguins %>% filter(sex=="female")

This gives us a new dataframe , a tibble, containing rows with sex column value “female”column.

## # A tibble: 165 x 7## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g## <chr> <chr> <dbl> <dbl> <dbl> <dbl>## 1 Adelie Torge… 39.5 17.4 186 3800## 2 Adelie Torge… 40.3 18 195 3250## 3 Adelie Torge… 36.7 19.3 193 3450## 4 Adelie Torge… 38.9 17.8 181 3625## 5 Adelie Torge… 41.1 17.6 182 3200## 6 Adelie Torge… 36.6 17.8 185 3700## 7 Adelie Torge… 38.7 19 195 3450## 8 Adelie Torge… 34.4 18.4 184 3325## 9 Adelie Biscoe 37.8 18.3 174 3400## 10 Adelie Biscoe 35.9 19.2 189 3800## # … with 155 more rows, and 1 more variable: sex <chr>

In our first example using filter() function in dplyr, we used the pipe operator “%>%” while using filter() function to select rows. Like other dplyr functions, we can also use filter() function without the pipe operator as shown below.

filter(penguins, sex=="female")

And we will get the same results as shown above.

In the above example, we selected rows of a dataframe by checking equality of variable’s value. We can also use filter to select rows by checking for inequality, greater or less (equal) than a variable’s value.

Let us see an example of filtering rows when a column’s value is not equal to “something”. In the example below, we filter dataframe whose species column values are not “Adelie”.

penguins %>% filter(species != "Adelie")

We now get a filtered dataframe with species other than “Adelie”

## # A tibble: 192 x 7## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g## <chr> <chr> <dbl> <dbl> <dbl> <dbl>## 1 Gentoo Biscoe 46.1 13.2 211 4500## 2 Gentoo Biscoe 50 16.3 230 5700## 3 Gentoo Biscoe 48.7 14.1 210 4450## 4 Gentoo Biscoe 50 15.2 218 5700## 5 Gentoo Biscoe 47.6 14.5 215 5400## 6 Gentoo Biscoe 46.5 13.5 210 4550## 7 Gentoo Biscoe 45.4 14.6 211 4800## 8 Gentoo Biscoe 46.7 15.3 219 5200## 9 Gentoo Biscoe 43.3 13.4 209 4400## 10 Gentoo Biscoe 46.8 15.4 215 5150## # … with 182 more rows, and 1 more variable: sex <chr>

dplyr filter() with greater than condition

When the column of interest is a numerical, we can select rows by using greater than condition. Let us see an example of filtering rows when a column’s value is greater than some specific value.

In the example below, we filter dataframe such that we select rows with body mass is greater than 6000 to see the heaviest penguins.

# filter variable greater than a valuepenguins %>% filter(body_mass_g> 6000)

After filtering for body mass, we get just two rows that satisfy body mass condition we provided.

# # A tibble: 2 x 7## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>## 1 Gentoo Biscoe 49.2 15.2 221 6300 male ## 2 Gentoo Biscoe 59.6 17 230 6050 male

Similarly, we can select or filter rows when a column’s value is less than some specific value.

dplyr filter() with less than condition

Similarly, we can also filter rows of a dataframe with less than condition. In this example below, we select rows whose flipper length column is less than 175.

# filter variable less than a valuepenguins %>% filter(flipper_length_mm <175)

Here we get a new tibble with just rows satisfying our condition.

## # A tibble: 2 x 7## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>## 1 Adelie Biscoe 37.8 18.3 174 3400 fema…## 2 Adelie Biscoe 37.9 18.6 172 3150 fema…

How to Filter Rows of a dataframe using two conditions?

With dplyr’s filter() function, we can also specify more than one conditions. In the example below, we have two conditions inside filter() function, one specifies flipper length greater than 220 and second condition for sex column.

# 2.6.1 Boolean ANDpenguins %>% filter(flipper_length_mm >220 & sex=="female")
## # A tibble: 1 x 7## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>## 1 Gentoo Biscoe 46.9 14.6 222 4875 fema…

dplyr’s filter() function with Boolean OR

We can filter dataframe for rows satisfying one of the two conditions using Boolean OR. In this example, we select rows whose flipper length value is greater than 220 or bill depth is less than 10.

penguins %>% filter(flipper_length_mm >220 | bill_depth_mm < 10)
## # A tibble: 35 x 7## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g## <chr> <chr> <dbl> <dbl> <dbl> <dbl>## 1 Gentoo Biscoe 50 16.3 230 5700## 2 Gentoo Biscoe 49.2 15.2 221 6300## 3 Gentoo Biscoe 48.7 15.1 222 5350## 4 Gentoo Biscoe 47.3 15.3 222 5250## 5 Gentoo Biscoe 59.6 17 230 6050## 6 Gentoo Biscoe 49.6 16 225 5700## 7 Gentoo Biscoe 50.5 15.9 222 5550## 8 Gentoo Biscoe 50.5 15.9 225 5400## 9 Gentoo Biscoe 50.1 15 225 5000## 10 Gentoo Biscoe 50.4 15.3 224 5550## # … with 25 more rows, and 1 more variable: sex <chr>

Select rows with missing value in a column

Often one might want to filter for or filter out rows if one of the columns have missing values. With is.na() on the column of interest, we can select rows based on a specific column value is missing.

In this example, we select rows or filter rows with bill length column with missing values.

penguins %>% filter(is.na(bill_length_mm))

In this dataset, there are only two rows with missing values in bill length column.

## # A tibble: 2 x 8## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <fct> <fct> <dbl> <dbl> <int> <int> <fct>## 1 Adelie Torge… NA NA NA NA <NA> ## 2 Gentoo Biscoe NA NA NA NA <NA> ## # … with 1 more variable: year <int>

We can also use negation symbol “!” to reverse the selection. In this example, we select rows with no missing values for sex column.

penguins %>% filter(!is.na(sex))

Note that this filtering will keep rows with other column values with missing values.

## # A tibble: 333 x 7## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g## <chr> <chr> <dbl> <dbl> <dbl> <dbl>## 1 Adelie Torge… 39.1 18.7 181 3750## 2 Adelie Torge… 39.5 17.4 186 3800## 3 Adelie Torge… 40.3 18 195 3250## 4 Adelie Torge… 36.7 19.3 193 3450## 5 Adelie Torge… 39.3 20.6 190 3650## 6 Adelie Torge… 38.9 17.8 181 3625## 7 Adelie Torge… 39.2 19.6 195 4675## 8 Adelie Torge… 41.1 17.6 182 3200## 9 Adelie Torge… 38.6 21.2 191 3800## 10 Adelie Torge… 34.6 21.1 198 4400## # … with 323 more rows, and 1 more variable: sex <chr>

Related posts:

dplyr arrange(): Sort/Reorder by One or More Variablesdplyr mutate(): Create New Variables with mutatedplyr groupby() and summarize(): Group By One or More VariablesHow to Filter Rows Based on Column Values with query function in Pandas?

FAQs

How to filter rows based on condition in R? ›

By using filter()

Finally, you can achieve selecting rows from the data frame by using the filter() function from the dplyr package. In order to use this package, first, you need to install it by using install. packages("dplyr") and load it using library("dplyr") .

How do I filter rows in R based on multiple conditions? ›

R Filter Rows by Multiple Conditions

The subset() is a R base function that is used to get the observations and variables from the data frame (DataFrame) by submitting with multiple conditions. Also used to get filter vectors and matrices. This subset() function takes a syntax subset(x, subset, select, drop = FALSE, …)

How do I filter specific rows from a DataFrame in R? ›

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

How to filter rows based on column value in R? ›

You can do this by using filter() function from dplyr package. dplyr is a package that provides a grammar of data manipulation, and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. All dplyr verbs take input as data. frame and return data.

How do you filter data in Python based on condition? ›

Five Ways to do Conditional Filtering in Pandas
  1. Pandas filtering with selection brackets.
  2. Pandas series methods: isin(), between(), contains()
  3. Defining separate filters outside of selection bracket filtering.
  4. query()
  5. loc[]
  6. Bonus: Using the pandas filter() method.
Dec 8, 2022

How do you select certain rows in R? ›

We can select rows (observations) by Index in R by using a single square bracket operator df[rows,columns] , From the square bracket, we should be using rows position, and columns are used to select columns. In R rows are called observations and columns are called variables.

How do you filter rows based on conditions? ›

Filter Rows by Condition

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.

How do I filter rows in conditional formatting? ›

How to sort or filter by conditional format results in Excel
  1. Open your Excel spreadsheet.
  2. Click on the Data tab.
  3. Click on the Sort & Filter button.
  4. Click on the Filter button.
  5. Click on the drop-down arrow next to the column you want to filter by conditional format results.
  6. Click on the Conditional Formatting option.

How do you filter with two conditions? ›

Use FILTER function with multiple conditions, at least one need to be fulfilled
  1. Enter FILTER function.
  2. Enter selected range as first argument (RANGE).
  3. Enter desired condition as the second argument (CONDITION).
  4. In the same argument, add the plus symbol (+) and your other condition.
  5. Repeat step 4 as needed. ‍

How do I filter rows in a DataFrame based on a list? ›

DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.

How to extract specific rows and columns from DataFrame in R? ›

There are five common ways to extract rows from a data frame in R:
  1. Method 1: Extract One Row by Position #extract row 2 df[2, ]
  2. Method 2: Extract Multiple Rows by Position #extract rows 2, 4, and 5 df[c(2, 4, 5), ]
  3. Method 3: Extract Range of Rows #extract rows in range of 1 to 3 df[1:3, ]
May 17, 2022

How do you select specific rows in a DataFrame using their indexes? ›

You can select a single row from pandas DataFrame by integer index using df. iloc[n] . Replace n with a position you wanted to select.

How to select rows based on column value in R? ›

Select Rows by list of Column Values. By using the same notation you can also use an operator %in% to select the DataFrame rows based on a list of values. The following example returns all rows when state values are present in vector values c('CA','AZ','PH') . Yields below output.

How do you filter data in rows? ›

Filter a range of data
  1. Select any cell within the range.
  2. Select Data > Filter.
  3. Select the column header arrow .
  4. Select Text Filters or Number Filters, and then select a comparison, like Between.
  5. Enter the filter criteria and select OK.

What is the filter function in R rows? ›

The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.

How do I filter a row in R? ›

In order to filter data frame rows by row number or positions in R, we have to use the slice() function. this function takes the data frame object as the first argument and the row number you wanted to filter.

Which clause is used to filter rows based on some criteria? ›

The WHERE Clause

In SQL, the SELECT statement is used to return specific columns of data from a table. Similarly, the WHERE clause is used to choose the rows of a table, and the query will return only the rows that meet the given criteria.

How do I filter specific rows? ›

Filter a range of data
  1. Select any cell within the range.
  2. Select Data > Filter.
  3. Select the column header arrow .
  4. Select Text Filters or Number Filters, and then select a comparison, like Between.
  5. Enter the filter criteria and select OK.

How do you filter an array based on condition? ›

One can use the filter() function in JavaScript to filter the object array based on attributes. The filter() function will return a new array containing all the array elements that pass the given condition. If no elements pass the condition it returns an empty array.

References

Top Articles
Latest Posts
Article information

Author: Mr. See Jast

Last Updated: 11/24/2023

Views: 5771

Rating: 4.4 / 5 (55 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Mr. See Jast

Birthday: 1999-07-30

Address: 8409 Megan Mountain, New Mathew, MT 44997-8193

Phone: +5023589614038

Job: Chief Executive

Hobby: Leather crafting, Flag Football, Candle making, Flying, Poi, Gunsmithing, Swimming

Introduction: My name is Mr. See Jast, I am a open, jolly, gorgeous, courageous, inexpensive, friendly, homely person who loves writing and wants to share my knowledge and understanding with you.