This is a very common interview question if you are going for the role at junior level. Most of the people learn it character by character, but few understand how this query works. How good or bad its performance is? In this post, we will learn these things. I am using MySQL database for demonstration.
Sections in this post: Create the schema and populate table Write the query and verify the result How the query works Performance analysis
Create the schema and populate table
Lets create a simple example of Employee table. We will populate this table with id, and and salary of employees. We will write the query to find nth highest salary on this table.
Create a new Schema.
Create schema TestSQL;
Create a new table Employee_Test.
CREATE TABLE Employee_Test ( Employee_ID INT, Employee_name Varchar(100), Employee_Salary Decimal (10,2) );
Populate the table with test data.
INSERT INTO Employee_Test VALUES (1,'Anish',1000); INSERT INTO Employee_Test VALUES (2,'Sushant',1200); INSERT INTO Employee_Test VALUES (3,'Rakesh',1100); INSERT INTO Employee_Test VALUES (4,'Manoj',1300); INSERT INTO Employee_Test VALUES (5,'Amit',1400); INSERT INTO Employee_Test VALUES (6,'Lokesh',1600); INSERT INTO Employee_Test VALUES (7,'Maneneder',1400); INSERT INTO Employee_Test VALUES (8,'Narendra',400); INSERT INTO Employee_Test VALUES (9,'Kaushal',1900); INSERT INTO Employee_Test VALUES (10,'Vikas',3400); INSERT INTO Employee_Test VALUES (11,'Sudhir',800);
Write the query and verify the result
So as most of us know, query is written like this for finding nth highest salary.
SELECT * FROM Employee_Test Emp1 WHERE ( n ) = ( SELECT COUNT( DISTINCT ( Emp2.Employee_Salary ) ) FROM Employee_Test Emp2 WHERE Emp2.Employee_Salary >= Emp1.Employee_Salary )
Here, replace the n with any number. For example, if you have to find 6th highest salary , then replace n with 6.
FROM Employee_Test Emp1
WHERE (6) = (
SELECT COUNT( DISTINCT ( Emp2.Employee_Salary ) )
FROM Employee_Test Emp2
WHERE Emp2.Employee_Salary >= Emp1.Employee_Salary
Above query will produce below result.
How the query works
As we can see this query involves use of an inner query. Inner queries can be of two types. Correlated and uncorrelated queries. Uncorrelated query is where inner query can run independently of outer query, and correlated query is where inner query runs in conjunction to outer query. Our nth highest salary is an example of correlated query.
Lest understand first that the inner query executes every time, a row from outer query is processed. Inner query essentially does not do any very secret job, it only return the count of distinct salaries which are higher than the currently processing row’s salary column. Anytime, it find that salary column’s value of current row from outer query, is equal to count of higher salaries from inner query, it returns the result.
As we learned above that inner query executes every time, one row of outer query is processed, this brings a lot of performance overhead, specially if the number of rows are too big.
To avoid this, one should use DB specific keywords to get the result faster. For example in SQL server, one can use key word TOP like this:
SELECT TOP 1 EMPLOYEE_SALARY FROM ( SELECT DISTINCT TOP N EMPLOYEE_SALARY FROM EMPLOYEE_TEST ORDER BY EMPLOYEE_SALARY DESC ) A ORDER BY EMPLOYEE_SALARY WHERE N > 1
Happy Learning !!
All of my websites are hosted on Bluehost. Why? Because they make it incredibly easy to get a new site up and running with their one-click automatic WordPress Blog installation.
You can use your Bluehost account to host multiple domains—no need to pay more for hosting when you’re ready to start another website.