How to remove duplicate rows in mysql without using temporary table

We as a developer, often come across situations where we have to work on database related stuffs. Mostly it is done when client send you its data in form of excel sheets and you push that data to database tables after some excel manipulations. I have also done it many times.

A very common problem faced in this approach is that it might result in duplicate rows at times, because data sent is mostly from departments like HR and finance where people are not well aware of data normalization techniques [:-)].

I will use Employee table where column names are id, name, department and email.

Below are the SQL scripts for generating the test data.

Create schema TestDB;

CREATE TABLE EMPLOYEE
(
    ID INT,
    NAME Varchar(100),
    DEPARTMENT INT,
    EMAIL Varchar(100)
);

INSERT INTO EMPLOYEE VALUES (1,'Anish',101,'anish@howtodoinjava.com');
INSERT INTO EMPLOYEE VALUES (2,'Lokesh',102,'lokesh@howtodoinjava.com');
INSERT INTO EMPLOYEE VALUES (3,'Rakesh',103,'rakesh@howtodoinjava.com');
INSERT INTO EMPLOYEE VALUES (4,'Yogesh',104,'yogesh@howtodoinjava.com');

--These are the duplicate rows

INSERT INTO EMPLOYEE VALUES (5,'Anish',101,'anish@howtodoinjava.com');
INSERT INTO EMPLOYEE VALUES (6,'Lokesh',102,'lokesh@howtodoinjava.com');

Solution:

DELETE e1 FROM EMPLOYEE e1, EMPLOYEE e2 WHERE e1.name = e2.name AND e1.id > e2.id;

Above sql query will delete rows where name field is duplicate and only those unique rows will be retained where name is unique and ID field is lowest. For example rows with ID 5 and 6 will be deleted and rows with 1 and 2 will be retained.

delete-duplicate-rows-in-mysql

If you want to retain rows with latest generated ID values, then reverse the condition in where clause to e1.id < e2.id like this:

DELETE e1 FROM EMPLOYEE e1, EMPLOYEE e2 WHERE e1.name = e2.name AND e1.id > e2.id;

If you want to compare multiple fields and add appropriate where clause.

Note: Please execute above (or modified) query first on test data always to make sure it is producing the expected output.

Happy Learning !!

Lokesh

I have 7 Years of rich experience in java technology. This has only increased my hunger to learn more. In this blog, i will be writing on different topics occasionally, and would love to engage in some meaningful serious discussions with you folks.

You may also like...

6 Responses

  1. Mad-hab says:

    How the query will change if there is not Id column in the table , rest all remain as mentioned?

    • Lokesh says:

      In that case, what makes two rows duplicates? By what logic you will identify two duplicate rows? And if you identify such logic, then convert it to query.. I hope I make sense.

  2. Bob says:

    How long should your example above take when executed on a table with over 300,000 rows? I had to execute the following to even get your example to work on my table:

    SET SESSION SQL_BIG_SELECTS=1;

    Thanks!

    • Lokesh says:

      Hey Bob, I have not executed above query on big tables yet so can’t comment on performance of it. I believe that if database table is large, then we should use temp table approach. And never forget to back up data first .. :-)

  3. If we have database and table already created in mysql
    then how to get database connection in spring 3.2

Note:- In comment box, please put your code inside [java] ... [/java] OR [xml] ... [/xml] tags otherwise it may not appear/post as intended.


Want to ask any question? Or suggest anything?