SQL – Remove Rows without Duplicate Column Values

We as a developer, often come across situations where we have to work on database related stuff. Mostly it is done when the client sends you its data in form of excel sheets and you push that data to database tables after some excel manipulations. I have also done it many times.

A very common problem faced in this approach is that it might result in duplicate rows at times because data sent is mostly from departments like HR and finance where people are not well aware of data normalization techniques :-).

1. Prepare Test Data and Expectation

We will use the Employee table where column names are id, name, department and email. We have to delete all rows which have duplicate NAME column values and retain only one row.

For example, if there are two rows with NAME as LOKESH, then one row will be deleted and only one row remains after the query execution.

The given below is the SQL script for generating the test data.

Create schema TestDB;

CREATE TABLE EMPLOYEE
(
    ID INT,
    NAME Varchar(100),
    DEPARTMENT INT,
    EMAIL Varchar(100)
);

INSERT INTO EMPLOYEE VALUES (1,'Anish',101,'anish@howtodoinjava.com');
INSERT INTO EMPLOYEE VALUES (2,'Lokesh',102,'lokesh@howtodoinjava.com');
INSERT INTO EMPLOYEE VALUES (3,'Rakesh',103,'rakesh@howtodoinjava.com');
INSERT INTO EMPLOYEE VALUES (4,'Yogesh',104,'yogesh@howtodoinjava.com');

--These are the duplicate rows

INSERT INTO EMPLOYEE VALUES (5,'Anish',101,'anish@howtodoinjava.com');
INSERT INTO EMPLOYEE VALUES (6,'Lokesh',102,'lokesh@howtodoinjava.com');

2. SQL Query to RemveDuplicate Rows

DELETE e1 FROM EMPLOYEE e1, EMPLOYEE e2 WHERE e1.name = e2.name AND e1.id > e2.id;

Above SQL query will delete rows where the name field is duplicate and only those unique rows will be retained where the name is unique and the ID field is lowest.

For example rows with ID 5 and 6 will be deleted and rows with 1 and 2 will be retained.

delete-duplicate-rows-in-mysql-6630200

If you want to retain rows with the latest generated ID values, then reverse the condition in where clause to e1.id < e2.id like this:

DELETE e1 FROM EMPLOYEE e1, EMPLOYEE e2 WHERE e1.name = e2.name AND e1.id > e2.id;

If you want to compare multiple columns then add them in the WHERE clause.

Please execute the above (or modified) query first on test data always to make sure it is producing the expected output.

Happy Learning !!

Was this post helpful?

Join 8000+ Awesome Developers, Like YOU!

13 thoughts on “SQL – Remove Rows without Duplicate Column Values”

  1. Hi Lokesh,

    can you tell me how works in below scenario.

    INSERT INTO EMPLOYEE VALUES (1,’Anish’,101,’anish@howtodoinjava.com’);
    INSERT INTO EMPLOYEE VALUES (2,’Lokesh’,102,’lokesh@howtodoinjava.com’);
    INSERT INTO EMPLOYEE VALUES (3,’Rakesh’,103,’rakesh@howtodoinjava.com’);
    INSERT INTO EMPLOYEE VALUES (4,’Yogesh’,104,’yogesh@howtodoinjava.com’);
    –These are the duplicate rows
    INSERT INTO EMPLOYEE VALUES (1,’Anish’,101,’anish@howtodoinjava.com’);
    INSERT INTO EMPLOYEE VALUES (2,’Lokesh’,102,’lokesh@howtodoinjava.com’);

    in duplicate rows id for Anish and Lokesh same. how will delete records.

    Thanks.

    Reply
  2. Sir i need to get a pdf report for students with ireports ,from more than one table..if i pass new JRE EmptydataSourse() .the design is displaying,but with Connection Object only blank pdf is displaying ..i am using struts 1.x ,can u please give me any suggetions for this task.

    Reply
    • In that case, what makes two rows duplicates? By what logic you will identify two duplicate rows? And if you identify such logic, then convert it to query.. I hope I make sense.

      Reply
      • It makes sence, but not resolves a problem.
        I saggest another method: add unique index to the table with some fields to check the duplicates. For example,

        ALTER IGNORE TABLE `employee` ADD UNIQUE INDEX `name_email` (`name`,`email`);

        Reply
  3. How long should your example above take when executed on a table with over 300,000 rows? I had to execute the following to even get your example to work on my table:

    SET SESSION SQL_BIG_SELECTS=1;

    Thanks!

    Reply
    • Hey Bob, I have not executed above query on big tables yet so can’t comment on performance of it. I believe that if database table is large, then we should use temp table approach. And never forget to back up data first .. 🙂

      Reply

Leave a Comment

About HowToDoInJava

This blog provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions, and frequently asked interview questions.

Our Blogs

REST API Tutorial