Pandas DataFrame duplicated() Method
Example
Check which rows are duplicated and not:
import pandas as pd
data = {
"name": ["Sally", "Mary",
"John", "Mary"],
"age": [50, 40, 30, 40]
}
df = pd.DataFrame(data)
s
= df.duplicated()
Try it Yourself »
Definition and Usage
The duplicated()
method returns a Series
with True and False values that describe which rows in the DataFrame are
duplicated and not.
Use the subset
parameter to specify if any
columns should not be considered when looking for duplicates.
Syntax
dataframe.duplicated(subset, keep)
Parameters
The parameters are keyword arguments.
Parameter | Value | Description |
---|---|---|
subset | column label(s) | Optional. A String, or a list, containing any columns to ignore |
keep | 'first' |
Optional, default 'first'. Specifies which duplicate to keep. If False, drop ALL duplicates |
Return Value
A Series with a boolean value for each row in the DataFrame.