Pandas DataFrame drop_duplicates() Method
Example
Remove duplicate rows from the DataFrame:
import pandas as pd
data = {
"name": ["Sally", "Mary",
"John", "Mary"],
"age": [50, 40, 30, 40],
"qualified":
[True, False, False, False]
}
df = pd.DataFrame(data)
newdf
= df.drop_duplicates()
Try it Yourself »
Definition and Usage
The drop_duplicates()
method removes
duplicate rows.
Use the subset
parameter if only some
specified columns should be considered when looking for duplicates.
Syntax
dataframe.drop_duplicates(subset, keep, inplace, ignore_index)
Parameters
The parameters are keyword arguments.
Parameter | Value | Description |
---|---|---|
subset | column label(s) | Optional. A String, or a list, containing the columns to use when looking for duplicates. If not specified, all columns are being used. |
keep | 'first' |
Optional, default 'first'. Specifies which duplicate to keep. If False, drop ALL duplicates |
inplace | True |
Optional, default False. If True: the removing is done on the current DataFrame. If False: returns a copy where the removing is done. |
ignore_index | True |
Optional, default False. Specifies whether to label the 0, 1, 2 etc., or not |
Return Value
A DataFrame with the result, or None if the inplace parameter is set to True.