Pandas DataFrame drop_duplicates() Method

Example

Remove duplicate rows from the DataFrame:

import pandas as pd

data = {
"name": ["Sally", "Mary", "John", "Mary"],
"age": [50, 40, 30, 40],
"qualified": [True, False, False, False]
}

df = pd.DataFrame(data)

newdf = df.drop_duplicates()

Try it Yourself »

Definition and Usage

The drop_duplicates() method removes duplicate rows.

Use the subset parameter if only some specified columns should be considered when looking for duplicates.

Syntax

dataframe.drop_duplicates(subset, keep, inplace, ignore_index)

Parameters

The parameters are keyword arguments.

Parameter	Value	Description
subset	column label(s)	Optional. A String, or a list, containing the columns to use when looking for duplicates. If not specified, all columns are being used.
keep	`'first' 'last' False`	Optional, default 'first'. Specifies which duplicate to keep. If False, drop ALL duplicates
inplace	`True False`	Optional, default False. If True: the removing is done on the current DataFrame. If False: returns a copy where the removing is done.
ignore_index	`True False`	Optional, default False. Specifies whether to label the 0, 1, 2 etc., or not