Adsense

Thursday, March 28, 2013

Single query to remove duplicate records in db2


Are you using DB2 database . You may be searching solution to select or delete duplicate records available in any table using a single query. The solution is here . This tutorial covers how to display and delete duplicate records usning single query with examples . Let me explain with the following example.
Table : Sales_Master


In the above picture , Sales_Master table has duplicate records , but it should not have. The following table shows how howmany times each items are stored in the Sales_Master table. I have used the following query
select item_name, item_code, count(item_name) from sales_master group by item_name,item_code order by item_code
 . In the above output , 3rd column is Number of Times Item_code exists
Soap_Product1 (4 times) , Soap_Product2 (2 times ) , Soap_Product3 (2 times) , Soap_Product4 (4 times)  , Sea_Product1 (2 times) . Now what we have to do is , we have to keep Soap_Product1 , Soap_Product2 , Soap_Product3 , Soap_Product4 , Sea_Product1 one time only.
In the Sales_Master , there is no unique field , so we need to generate unique values . rownumber() or row_number() OLAP function will help you out. rownumber() will generate unique number for each row which should be used with over() OLAP function. . Sample query using rownumber() over() is as follows
SELECT item_code , item_name , ROW_NUMBER() OVER() FROM sales_master order by item_code . This query returns all rows with extra column with unique row number order by item_code. This query will not help you to display only duplicate records keeping one record (either with minimum row number or maximum row number) . For that you have to partition (group) the records based on the item_code and each group of records should have unique row numbers . The following query helps you to display records partiioned by item_code
select item_code,item_name, rownumber() over (partition by item_code order by item_code) from sales_master which outputs as follows

Now from the above query , let us develop another query which displays only the duplicate records ( leaving one record for each item_code) and only unique rows that means for duplicate recods we have to display the records only whose rownumber is greater than 1 , for unique records the row number should be equal to 1. The following queries help you out to display duplicate records only

select * from (select item_code,item_name, rownumber() over (partition by item_code order by item_code) from sales_master) as E(item_code,item_name,RN) where RN>1

with duprows as (select item_code , row_number() over (partition by item_code order by item_code) a from sales_master ) select * from duprows where a>1

The following query displays only unique records
select * from (select item_code,item_name, rownumber() over (partition by item_code order by item_code) from sales_master) as E(item_code,item_name,RN) where RN=1

The above query displays the following output

Now we have come to final stage to delete duplicate records using single query
Single query to delete duplicate records (leaving one record for each item_code ) from a table.
    
delete from (select  row_number() over (partition by item_code order by item_code) from sales_master) as E(RN) where RN>1 


Now the final out put of the table by the query after removing duplicate records select * from sales_master


Note : Other ways to remove duplicate records , you can visit Remove duplicate records

Stored Procedure in db2 and benefits. Create Procedure command with example

A stored procedure is a subroutine that can be called by an application with an SQL CALL statement . A stored procedure is stored in the database itself. Stored Procedures are invoked by executing the CALL statement with a reference to a procedure. The storedprocedure can be called locally or remotely. 


Avoidance of network traffic :
 In client server architecture, applications access the remote database over the network. This results in poor performance because it returns a lot of unnecessary data to the client. But when using stored procedure, as stored procedure runs on the server, only the results the client application needs are returned. This reduces the network traffic, also Stored Procedures groups SQL statements together which can also save on network traffic. A typical application requires two trips across the network for each SQL statement. Grouping of SQL statements reduces the trips across the network which results in better performance for applications. 

Access to features that exist only on the server : Stored procedures can have access to commands that run only on the server, such as LIST DB DIRECTORY and LIST TABLES. It can take the advantages of increased memory and disk space on server machines and they can access any additional software installed on the server.

Define Business rules common to several application : You can use stored procedures to define business rules that are common to several applications similar to triggers and constraints. When an application calls the stored procedure, it will process data based on the rules defined in the stored procedure. If you need to change the rules, you only need to make the change once in the stored procedure, not in every application that calls the stored procedure.

Protection from SQL injection attacks: Stored procedures reduces the risk of sql injection attacks. Stored procedure parameters will be treated as data even if an attacker inserts SQL commands

Helps for workload distribution: It splits the application logic and encourages an even distribution of the computational workload.Stored procedures can be developed using DB2 development center (in db2 8.2) , a GUI based or you can create using CLP . Now let us see how to create and run a Stored Procedures using Command Line Processor (CLP ) . 

Example 1 : Stored Procedure to filter job seekers applications based on the conditions like Date of Birth of the candidate should be between the given two input dates and qualification should be equal to the given input string. The Stored Procedure will accept three input parameters MinDate , MaxDate and Qualification. Output will be the result sets. 

 
CREATE PROCEDURE Shortlist (mindate date , maxdate date, qual  varchar(20))     SPECIFIC sp5    DYNAMIC RESULT SETS 1 
P1: BEGIN
    DECLARE cursor1 CURSOR WITH RETURN FOR   select * from applications where dob>=mindate and dob<=maxdate and qualification=qual;
    OPEN cursor1;
END P1 
@

Store the above lines in a file named sp1.sql. If you run the above script file using the command 
                         db2 -tvf sp1.sql,
                          you may get the error like    DB21028E The cursor "CURSOR1" has not been declared. DB21007E End of file reached while reading the command. 

To solve the above error, select an alternate terminating character for the Command Line Processor (DB2 CLP), other than the default terminating character which is semicolon (;) , to use in the script. For example, the above create procedure statement, @ is used as a terminating character. Now run the DB2 CLP script containing the CREATE PROCEDURE statement from the command line, using the following CLP command:

db2 -td <terminating-character> -vf <CLP-script-name> 

       to run the above Stored Procedure, issue the command 
          db2 -td@ -vf sp1.sql 
                               where @ is a terminating character for the procedure. Now the Stored Procedure (sp1.sql) is created. To run the above stored procedure , issue the command 

db2 call Shortlist('1970-01-01', '1980-01-01', 'BE') ,

      running the command returns all the records whose dob<='1970-01-01' and dob<='1980-01-01' and qualification='BE'

Example 2 : To update employee's salary based on the grade.  Accepts input employee code and grade . 

 
CREATE PROCEDURE SALARY_UPDATE  (IN emp_code varchar(10), IN grade SMALLINT)    LANGUAGE SQL
    BEGIN
      DECLARE error CONDITION FOR SQLSTATE '02000';
      DECLARE EXIT HANDLER FOR error  SIGNAL SQLSTATE '20001' SET MESSAGE_TEXT = 'Employee code does not exist';
      IF (grade = 1)  THEN UPDATE emp3   SET salary = salary + (salary*15/100)   WHERE empcode= emp_code;
      ELSEIF (grade = 2)   THEN UPDATE emp3     SET salary = salary + (salary*10/100)   WHERE empcode= emp_code;
      ELSEIF (grade=3) THEN UPDATE emp3   SET salary = salary + (salary*5/100)    WHERE empcode= emp_code;
      END IF;
    END
@   

Store the above script in a file named sp2.sql

Run the above script by the command 
                           db2 -td@ -vf sp2.sql.            and run the procedure by 
                            db2 call SALARY_UPDATE('1132', 1)   
                                         -where empcode is 1132 and grade is 1.  So the salary will be updated by 15% increment. 

Suppose the employee code does not exist in the table , then the default error
                SQL0100W No row was found for FETCH, UPDATE or DELETE; or the result of a query is an empty table. SQLSTATE=02000      will be thrown. 

  SQLSTATE is 02000 

Now we can change the default error message with own error message like Employee code does not exist . . For the the following lines are used. error variable is declared, if the SQLSTATE=02000. For that error, we are setting a new message as "Employee code doe not exist" and SQLSTATE as 20001.

DECLARE error CONDITION FOR SQLSTATE '02000'; 

DECLARE EXIT HANDLER FOR error SIGNAL SQLSTATE '20001' SET MESSAGE_TEXT = 'Employee code doe not exist'; .  So the following error will be thrown, if no records found. 

SQL0438N Application raised error with diagnostic text: "Employee code does not exist". SQLSTATE=20001

Example 3 : The above two stored procedures return the result sets as output. Now let us see the a different example , to calculate sum of salaries of a particular employee's whose designation matches the given input string . (for eg. UDC , LDC, ...)

 
CREATE PROCEDURE totalsalary(IN Desig char(20)  , OUT totalsalary INTEGER)   LANGUAGE SQL
  BEGIN
    DECLARE SQLSTATE CHAR(5) DEFAULT '00000';
    DECLARE tot_sal INTEGER;
    DECLARE empsalary INTEGER;
    DECLARE cursor1 CURSOR FOR SELECT SALARY FROM EMP3 where Designation=Desig;
      SET tot_sal= 0;
     OPEN cursor1;
     FETCH FROM cursor1 INTO empsalary;
     WHILE(SQLSTATE = '00000') DO
        SET tot_sal = tot_sal+ empsalary;
        FETCH FROM cursor1 INTO empsalary; 
     END WHILE;
     CLOSE cursor1;
     SET totalsalary=tot_sal;
  END%

In the above script, termination charater is %, so run the script with command 
                                db2 -td% -vf sp3.sql
            It accepts two parameters. One is for input , another is for output. We have set SQLSTATE to '00000'. So the while loop continues until the SQLSTATE changes. The SQLSTATE changes when the cursor has no further rows. when there is no rows , it throws exception with SQLSTAT 02000. Now the while loop terminates. 


how to get / change log file size in db2.


This tutorial explains about one of the db2 database configuration parameter LOGFILSIZ and How to decide and change the log file size ? . LOGFILSIZ defines the size of transaction log file that means the size of each primary and secondary log file. The default size is 1000 pages where each page having 4 K size that means log file can hold upto 4 MB of transactional data . The size of these log files limits the number of log records that can be written to them . When they become full , a new log file is required. If the database has a large number of transactions like update, delete or insert (i.e. in a high transaction OLTP environment) , the size of the log file should be larger because small size log files will be filled very quickly and new log files would have to be created frequently and also has to wait for a usable log file. This can affect system performance . At the same time ,the value of the logfilsiz should be reduced if disk space is less, since primary logs are preallocated at the size mentioned . In log retention loging , the current active log file is closed and truncated when the application disconnects from a database. When the database is reconnected , the next log file is used. Therefore, if we use very large log file , there may be lot of waste space . Conclusion : Make understand the logging requirements (maximum size of uncommitted changes at any time) of your concurrent applications , and determine a log file size . Neither it should be too small nor it should be too large.

To get the exisiting LOGFILSIZ of a particular database , please issue the following command
db2 get db cfg for sales where sales is the database name.
The line related to LOGFILSIZ is as follows
   Log file size (4KB)                     (LOGFILSIZ) = 1000

To change the LOGFILSIZ parameter , please issue the following command
db2 update db cfg for sales using logfilsiz 2000 ,where "logfilsiz" should be in the range of "4" to "262144".
where 2000 is the no of pages . so the log file size will be 2000*4 KB = 8 MB

In older versions , you can configure the upper limit of 256 GB of active log space. Let us see , how it is calculated.
Maximum pages allowed are 262144 .
So the upper limit of the each log file size will be 262144 * 4 KB = 1048576 KB = 1024 MB = 1 GB , where 4 KB is the size of each page.
Maximum combined number of primary and secondary log files is 256
So upper limit of active log space will be 256 * 1 GB = 256 GB.
In DB2 version 9.0 , upper limit of active log space 512. here upper limit for each log file, which is 2 GB
Starting from Version 9.5 Fix Pack 3 to latest version 9.7 , there is now 1024 GB of active log space available.

Note : When a log file becomes full , the action taken on primary and secondary log files depends on the type of logging performed . In Circular logging , primary log file can be reused when the data recorded in it have been committed. If the log file size is small , when the uncommitted data / changes in the log file exceeds log file size, the next primary log file is used . If all primary log files become full, the database manager will allocate secondary log files to hold the new log records. When all the primary and secondary log files are full (uncommitted data size exceeds all log file size) , the error "the transaction log is full " occurs. To solve the error , please read my next post Solution to Transaction log full . In Log retention logging , When a primary log file is full, the log is archived and a new primary log file is allocated.