Adsense

Thursday, March 28, 2013

Single query to remove duplicate records in db2


Are you using DB2 database . You may be searching solution to select or delete duplicate records available in any table using a single query. The solution is here . This tutorial covers how to display and delete duplicate records usning single query with examples . Let me explain with the following example.
Table : Sales_Master


In the above picture , Sales_Master table has duplicate records , but it should not have. The following table shows how howmany times each items are stored in the Sales_Master table. I have used the following query
select item_name, item_code, count(item_name) from sales_master group by item_name,item_code order by item_code
 . In the above output , 3rd column is Number of Times Item_code exists
Soap_Product1 (4 times) , Soap_Product2 (2 times ) , Soap_Product3 (2 times) , Soap_Product4 (4 times)  , Sea_Product1 (2 times) . Now what we have to do is , we have to keep Soap_Product1 , Soap_Product2 , Soap_Product3 , Soap_Product4 , Sea_Product1 one time only.
In the Sales_Master , there is no unique field , so we need to generate unique values . rownumber() or row_number() OLAP function will help you out. rownumber() will generate unique number for each row which should be used with over() OLAP function. . Sample query using rownumber() over() is as follows
SELECT item_code , item_name , ROW_NUMBER() OVER() FROM sales_master order by item_code . This query returns all rows with extra column with unique row number order by item_code. This query will not help you to display only duplicate records keeping one record (either with minimum row number or maximum row number) . For that you have to partition (group) the records based on the item_code and each group of records should have unique row numbers . The following query helps you to display records partiioned by item_code
select item_code,item_name, rownumber() over (partition by item_code order by item_code) from sales_master which outputs as follows

Now from the above query , let us develop another query which displays only the duplicate records ( leaving one record for each item_code) and only unique rows that means for duplicate recods we have to display the records only whose rownumber is greater than 1 , for unique records the row number should be equal to 1. The following queries help you out to display duplicate records only

select * from (select item_code,item_name, rownumber() over (partition by item_code order by item_code) from sales_master) as E(item_code,item_name,RN) where RN>1

with duprows as (select item_code , row_number() over (partition by item_code order by item_code) a from sales_master ) select * from duprows where a>1

The following query displays only unique records
select * from (select item_code,item_name, rownumber() over (partition by item_code order by item_code) from sales_master) as E(item_code,item_name,RN) where RN=1

The above query displays the following output

Now we have come to final stage to delete duplicate records using single query
Single query to delete duplicate records (leaving one record for each item_code ) from a table.
    
delete from (select  row_number() over (partition by item_code order by item_code) from sales_master) as E(RN) where RN>1 


Now the final out put of the table by the query after removing duplicate records select * from sales_master


Note : Other ways to remove duplicate records , you can visit Remove duplicate records

1 comment: