Cleanup data with “Remove Duplicates”!

Happy new 2022 to all!

A while ago, I grabbed a large dataset of emails and data and needed to filter the set to just unique emails.

Rather than go through the emails individually and removing duplicate rows, I used “Remove Duplicates” to do this efficiently and most importantly correctly.

Remove Duplicates Ribbon

This super useful function can take a column or columns of data and remove all duplicates from it. This is great for cleaning up your data set, getting counts of unique users etc.

An example:

  • In the data below, we have duplicate emails. Its a short example, but if you had for e.g. 3000 emails, it would be pretty difficult to figure out the # of unique mails.
Remove duplicates initial data
  • To get the unique list, we first select the columns of data, and hit Alt+A+M to bring up the Remove Duplicates dialog box!
Remove duplicates column picker modal
  • Select the columns you want to match duplicate data on and hit ok. IThe column selector lets you specify which columns you want to match on to remove duplicates – for e.g. – f you select 3 columns, the remove duplicate function looks for columns where all 3 column values match, before deleting.
  • Once you run the function, duplicates will be removed!
Remove duplicates result

In Google Sheets – you can do this using the UNIQUE function!