I had a project where the first step was to merge data from three files into one. I have used the word "merge" deliberately to differentiate it from "copy". With a "merge" I needed to update specific fields if a record is present in the output file with a matching key. If there is no matching key record I add a new record. I have done this many times in RPG, this time I wanted to try something different, like SQL.
After searching IBM's KnowledgeCenter I discovered the MERGE SQL statement. I do not know when it was released, but searching in the different versions of the KnowledgeCenter I can find it in the versions for IBM i 7.1, 7.2, and 7.3, but I cannot find it in the earlier releases. The MERGE does exactly what I want it to, it will update on a match and insert when there is no match.
Before I explain the MERGE I need to show the three files I will be using in this example. The files are named:
They all contain one common field, that I will be using to match with, and two unique fields.
Not all of the files have the records with the same matching key, see below.
The DDS and the content of for FILE1 looks like this:
01 A R FILE1R 02 A F1KEY1 3A 03 A F1F1 5A 04 A F1F2 10A F1KEY1 F1F1 F1F2 1 AAAAA BBBBBBBBBB 2 CCCCC DDDDDDDDDD 6 EEEEE FFFFFFFFFF
FILE2 looks like:
01 A R FILE2R 02 A F2KEY1 R REFFLD(F1KEY1 FILE1) 03 A F2F1 10A 04 A F2F2 5P 0 F1KEY1 F2F1 F2F2 1 GGGGGGGGGG 1 2 HHHHHHHHHH 2 3 IIIIIIIIII 3 4 JJJJJJJJJJ 4
The reason the field F2KEY1 shows F1KEY1 when looking at the data is because the REFFLD copies the attributes of F1KEY1 from FILE1. As F1KEY1 in FILE1 does not have a column heading, COLHDG, the field name is used.
01 A R FILE3R 02 A F3KEY1 R REFFLD(F1KEY1 FILE1) 03 A F3F1 7P 2 04 A F3F2 5A F1KEY1 F3F1 F3F2 1 5.00 KKKKK 3 6.00 LLLLL 5 7.00 MMMMM
I decided to put my MERGE into a RPG program, which starts like this:
01 exec sql SET OPTION COMMIT = *NONE ; 02 exec sql DROP TABLE QTEMP.TABLE1 ; 03 exec sql CREATE TABLE QTEMP.TABLE1 04 (KEY_COLUMN FOR "KEYFLD", 05 FIRST,SECOND,THIRD,FOURTH,FIFTH,SIXTH) 06 AS (SELECT A.F1KEY1,A.F1F1,A.F1F2, 07 B.F2F1,B.F2F2, 08 C.F3F1,C.F3F2 09 FROM FILE1 A CROSS JOIN 10 FILE2 B CROSS JOIN 11 FILE3 C 12 WHERE A.F1KEY1 = B.F2KEY1 13 AND A.F1KEY1 = C.F3KEY1) 14 DEFINITION ONLY 15 INCLUDING COLUMN DEFAULTS 16 RCDFMT TABLE1R ;
Line 1: I unless I need commitment control I always add this option into my SQLRPGLE programs.
Line 2: It may be unnecessary to delete a file in QTEMP, especially when the program is running in batch. This program was not, and I ran it repeatedly changing my program. I wanted to delete the table to ensure that the one that was being created would be the way I wanted it and it would be empty. If I use a CREATE OR REPLACE on line 3 the table would not be deleted and the data remain within it.
Lines 3 – 16: I am creating my output table in a similar way to using REFFLD in DDS, defining the attributes of the columns to be the same as the fields in the files used.
Line 3: When creating a table you use the CREATE TABLE statement followed by which library you want it in and its name. I have decided to use "SQL naming convention", therefore the library and file names are separated by a period ( . ).
Line 4: I have decided to give the first column a long name, KEY_COLUMN, as well as a short name, KEYFLD. I can use either name for the column in this program.
Line 5: These are the names of the other columns.
Lines 6 – 8: I am using a SELECT statement to give the "referenced" fields that will be used to define the columns on lines 4 and 5.
Lines 9 – 11: These are the files those fields are contained within. I have used a CROSS JOIN here I am just creating an empty table, therefore, I don't have to bother with trying to match records by using some other kind of join.
Lines 12 and 13: This defines how the three files will be joined.
Line 14: I only want to define the table, in other words create an empty table, so I use the DEFINTION ONLY.
Line 15: I have found that by not including the from field defaults did cause me errors when inserting into the table. By including INCLUDING COLUMN DEFAULTS means that if a column is not defined in an INSERT the default value, taken from the DDS field, is used.
Line 16: I have given the table a record format name. I may, at some time in the future, decide to read this file in a RPG program, and by having a record format name removes the need for me to use a RENAME in the file definition.
At the end of this code I have an empty table ready to be filled. Let me start by inserting data from FILE1.
17 exec sql INSERT INTO QTEMP.TABLE1 (KEY_COLUMN,FIRST,SECOND) SELECT * FROM FILE1 ;
When that has finished the output table contains:
KEY_COLUMN FIRST SECOND THIRD FOURTH FIFTH SIXTH 1 AAAAA BBBBBBBBBB 0 .00 2 CCCCC DDDDDDDDDD 0 .00 6 EEEEE FFFFFFFFFF 0 .00
Let me now merge the data from FILE2 into the output table:
18 exec sql MERGE INTO QTEMP.TABLE1 A USING FILE2 B 19 ON A.KEY_COLUMN = B.F2KEY1 20 WHEN MATCHED THEN 21 UPDATE SET A.THIRD = B.F2F1, 22 A.FOURTH = B.F2F2 23 WHEN NOT MATCHED THEN 24 INSERT (KEY_COLUMN,THIRD,FOURTH) 25 VALUES (B.F2KEY1,B.F2F1,B.F2F2) ;
Line 18: I want to merge data into QTEMP.TABLE1 from FILE2…
Line 19: Using these "key" fields for the match.
Line 20 – 22: If the "key" fields match update the existing columns in TABLE1.
Line 23 – 25: If the "key" fields do not match insert a new row into TABLE1 using the values from the fields in FILE2 record.
The MERGE updated the rows for "key" fields 1 and 2, and inserted new rows for the "key" values of 3 and 4.
KEY_COLUMN FIRST SECOND THIRD FOURTH FIFTH SIXTH 1 AAAAA BBBBBBBBBB GGGGGGGGGG 1 .00 2 CCCCC DDDDDDDDDD HHHHHHHHHH 2 .00 3 IIIIIIIIII 3 .00 4 JJJJJJJJJJ 4 .00 6 EEEEE FFFFFFFFFF 0 .00
If there is a problem with the matching, for example a duplicate row in either file caused by the given "key" matching, the MERGE quits. The error is diagnostic, therefore, it does not interrupt the user, and the merge is unfinished. The only way to find if this type of error has happened is to look at the joblog when the job has completed. To stop errors like this I can add an extra line to the MERGE statement. The merge of the data from FILE3 contains this extra line.
26 exec sql MERGE INTO QTEMP.TABLE1 A USING FILE3 B 27 ON A.KEY_COLUMN = B.F3KEY1 28 WHEN MATCHED THEN 29 UPDATE SET A.FIFTH = B.F3F1, 30 A.SIXTH = B.F3F2 31 WHEN NOT MATCHED THEN 32 INSERT (KEY_COLUMN,FIFTH,SIXTH) 33 VALUES (B.F3KEY1,B.F3F1,B.F3F2) 34 NOT ATOMIC CONTINUE ON SQLEXCEPTION ;
That "extra" line is line 34. This wonderfully named line means that if there is an error the merge must continue.
With all three files' data merged my output file looks like:
KEY_COLUMN FIRST SECOND THIRD FOURTH FIFTH SIXTH 1 AAAAA BBBBBBBBBB GGGGGGGGGG 1 5.00 KKKKK 2 CCCCC DDDDDDDDDD HHHHHHHHHH 2 .00 3 IIIIIIIIII 3 6.00 LLLLL 4 JJJJJJJJJJ 4 .00 5 0 7.00 MMMMM 6 EEEEE FFFFFFFFFF 0 .00
Having used the MERGE in a more complicated scenario I have to say I am impressed with its functionality and speed. No more merging data using RPG for me, I'll use SQL's MERGE instead.
You can learn more about SQL's MERGE from the IBM website here.
This article was written for IBM i 7.3, and should work for 7.2 and 7.1 too.