Wednesday, June 3, 2020

Using SQL to compare files for differences

sql table function compare_files to compare layout and data

Having looked at, and played, with a number of the new additions to Db2 for i in the latest new Technology Refresh this is my favorite and in my opinion the most useful: table function COMPARE_FILE.

COMPARE_FILE gives me the ability to identify differences between physical files, source files, and DDL (SQL) tables. I can compare two files layout and the data within them with this table function, and have a results row returned for each difference. Wow, both of those with the same table function!

Alas, it is only available in the Technology Refresh for 7.4 only, IBM i 7.4 TR2. It is not available in IBM i 7.3 TR8.

COMPARE_FILES has a number of parameters:

  1. LIBRARY1:  The library that contains the first file to be used in the comparison
  2. FILE1:  Name of the first file
  3. LIBRARY2:  Library that contains the second file
  4. FILE2:  Name of the second file
  5. RDB2:  Did not see the purpose of the parameter when comparing files in the same partition, I am just going to say omit this
  6. COMPARE_ATTRIBUTES:  Compare the layout of the two files
  7. COMPARE_DATA:  Compare the data in the files
  8. PARALLEL_DEGREE This is another parameter I am just going to say omit this

I am just going to give some simple examples in this post. In preparing the examples of code I did try all kinds of other scenarios of differences between files and was very impressed with what I found.

First I will need a simple file to which I can compare other to:

A          R TESTFILER          
A            F001          10A
A            F002          10A
A            F003          10P 0
A          K F001
A          K F002
A          K F003

In my first scenario I am going to compare two identical files. I created the copy of TESTFILE using the Create Duplicate Object command:

CRTDUPOBJ OBJ(TESTFILE) FROMLIB(MYLIB) OBJTYPE(*FILE) 
            NEWOBJ(TESTFILE1)

Now I can use the table function to compare the two files:

01  SELECT * 
02    FROM TABLE(QSYS2.COMPARE_FILE(
03               LIBRARY1 => 'MYLIB',
04               FILE1 => 'TESTFILE',
05               LIBRARY2 => 'MYLIB',
06               FILE2 => 'TESTFILE1',
07               COMPARE_ATTRIBUTES => 'YES',
08               COMPARE_DATA =>'NO'))

Line 1: There are three columns returned in the results from this table function:

  1. ATTRIBUTE_NAME:  Name of the difference between the two files
  2. FILE1:  Value from the first file
  3. FILE2:  Value from the second file

Lines 2 – 6: I think it is obvious what is going on with these parameters.

Line 7: The COMPARE_ATTRIBUTES can have three possible values:

  1. NO:  Don't compare the attributes
  2. YES:  Compare the attributes and return one row of results for each difference found
  3. QUICK  If you just want to check if the two files are different use this. It will return just one row to say that difference(s) have been found

Line 8: I am only interested in the layout of the two files in this example, therefore, the COMPARE_DATA is NO. Like the previous parameter this one also has three possible values:

  1. NO:  Don't compare the data
  2. YES:  Compare the data and return one row of results for each difference found
  3. QUICK  If you just want to check if the data in the two files is different use this. It will return just one row to say that difference(s) have been found

As the two files are identical there are no differences and no results are returned:

ATTRIBUTE_NAME  FILE1  FILE2  
--------------  -----  -----

In this next example I have created another file, TESTFILE2, with a slightly different layout:

A          R TESTFILE2R
A            F001          10A
A            F002           5A
A            F003          10S 0
A          K F001
A          K F002

The differences are:

  1. Record format names are different
  2. In TESTFILE the field F002 is 10 long, in TESTFILE2 it is 5 long
  3. In TESTFILE the field F003 is a packed numeric field, in TESTFILE2 it is signed numeric
  4. TESTFILE has three key fields, TESTFILE2 has two

Now I compare the two files with the following statement:

01  SELECT * 
02    FROM TABLE(QSYS2.COMPARE_FILE(
03               LIBRARY1 => 'MYLIB',
04               FILE1 => 'TESTFILE',
05               LIBRARY2 => 'MYLIB',
06               FILE2 => 'TESTFILE2',
07               COMPARE_ATTRIBUTES => 'YES',
08               COMPARE_DATA => 'NO'))

The data in the attribute column can be so long I have had to format my results to fit on this page. The first line is the result from the ATTRIBUTE_NAME, the second line is from the FILE1 and FILE2 columns.

ATTRIBUTE_NAME
     FILE1                  FILE2 
     ---------------------  ---------------------------------
Unmatched SOURCE_MEMBER in QSYS2.OBJECT_STATISTICS
     TESTFILE               TESTFILE2

Unmatched PHMXKL(MaximumKeyLength) in QSYS.QAFDPHY
     26                     15

Unmatched PHMXRL(MaximumRecordLength) in QSYS.QAFDPHY
     26                     25

Unmatched PHMXK2(MaximumKeyLength) in QSYS.QAFDPHY
     26                     15

Unmatched ROW_LENGTH in QSYS2.SYSTABLES OR QSYS2.SYSTABLESTAT
     26                     25

Unmatched attributes in QSYS2.SYSPARTITIONSTAT
     -                      Extra MEMBER TESTFILE2

Unmatched attributes in QSYS2.SYSPARTITIONSTAT
     Extra MEMBER TESTFILE  -

Unmatched DATA_TYPE(DataType) in QSYS2.SYSCOLUMNS2 with COLUMN_NAME 
= F003
     DECIMAL                NUMERIC

Unmatched LENGTH(Length) in QSYS2.SYSCOLUMNS2 with COLUMN_NAME 
= F002
     10                     5

Unmatched STORAGE(Storage) in QSYS2.SYSCOLUMNS2 with COLUMN_NAME 
= F002
     10                     5

Unmatched STORAGE(Storage) in QSYS2.SYSCOLUMNS2 with COLUMN_NAME 
= F003
     6                      10

Unmatched CHARACTER_MAXIMUM_LENGTH(CharacterMaximumLength) in 
QSYS2.SYSCOLUMNS2 with COLUMN_NAME = F002
     10                     5

Unmatched CHARACTER_OCTET_LENGTH(CharacterOctetLength) in 
QSYS2.SYSCOLUMNS2 with COLUMN_NAME = F002
     10                     5

Unmatched DDS_TYPE(DDSType) in QSYS2.SYSCOLUMNS2 with COLUMN_NAME 
= F003
     P                      S

Unmatched APNKYF(NumberOf KeyFields) in QSYS.QAFDACCP with APKEYF 
= F002
     3                      2

Unmatched APNKYF(NumberOf KeyFields) in QSYS.QAFDACCP with APKEYF 
= F001
     3                      2

Unmatched attributes in QSYS.QAFDACCP
     Extra APKEYF F003      -

If I just had wanted to see if there were differences between the two files I could have just used QUICK in the COMPARE_ATTRIBUTES parameter:

01  SELECT * 
02    FROM TABLE(QSYS2.COMPARE_FILE(
03               LIBRARY1 => 'MYLIB',
04               FILE1 => 'TESTFILE',
05               LIBRARY2 => 'MYLIB',
06               FILE2 => 'TESTFILE2',
07               COMPARE_ATTRIBUTES => 'QUICK',
08               COMPARE_DATA => 'NO'))

The results look like:

ATTRIBUTE_NAME
------------------------------------------------------------------
Unmatched ATTRIBUTES found, run compare with COMPARE_ATTRIBUTES => 
'YES' to get the full result

Now to compare data between two files. Before I do I need to show the data from both files I will be comparing. I am going to show the Relative Record Number, RRN, for each record as it will be easier to understand the results from COMPARE_FILE. I can easily show the RRN in my results by using the following statement for TESTFILE:

SELECT RRN(A) AS "RRN",A.* FROM MYLIB.TESTFILE A

The results from TESTFILE are:

RRN   F001     F002     F003
---   ------   ------   ----
  1   FIRST    ONE         1
  2   SECOND   TWO         2
  3   THIRD    THREE       3
  4   FOURTH   FOUR        4
  5   FIFTH    FIVE        5

I just need to change the name of the file in my SQL statement to get the results for TESTFILE1:

RRN   F001     F002     F003
---   ------   ------   ----
  1   FIRST    ONE         1
  2   THIRD    THREE       3
  3   FIFTH    FIVE        5
  4   SEVENTH  SEVEN       7
  5   NINTH    NINE        9

The RRN, Relative Record Number, is not a field in the file. I have just added it here to help explain the results of the comparison.

Now I use the COMPARE_DATA parameter as YES, line 8, and the COMPARE_ATTRIBUTES as NO, line7.

01  SELECT * 
02    FROM TABLE(QSYS2.COMPARE_FILE(
03               LIBRARY1 => 'MYLIB',
04               FILE1 => 'TESTFILE',
05               LIBRARY2 => 'MYLIB',
06               FILE2 => 'TESTFILE1',
07               COMPARE_ATTRIBUTES => 'NO',
08               COMPARE_DATA => 'YES'))

The results are:

ATTRIBUTE_NAME                     FILE1  FILE2
---------------------------------- -----  -----
Unmatched DATA for MEMBER TESTFILE RRN 2  RRN 2
Unmatched DATA for MEMBER TESTFILE RRN 5  RRN 5
Unmatched DATA for MEMBER TESTFILE RRN 4  RRN 4
Unmatched DATA for MEMBER TESTFILE RRN 3  RRN 3

Now you see why I wanted the RRN for each file.

If I use the QUICK option in the COMPARE_DATA, line 8:

01  SELECT * 
02    FROM TABLE(QSYS2.COMPARE_FILE(
03               LIBRARY1 => 'RPGPGM1',
04               FILE1 => 'TESTFILE1',
05               LIBRARY2 => 'RPGPGM1',
06               FILE2 => 'TESTFILE',
07               COMPARE_ATTRIBUTES => 'NO',
08               COMPARE_DATA => 'QUICK'))

I get the following result:

ATTRIBUTE_NAME
------------------------------------------------------
Unmatched DATA found, run compare with COMPARE_DATA => 
'YES' to get the full result

If I try and compare the data between TESTFILE with TESTFILE2:

01  SELECT * 
02    FROM TABLE(QSYS2.COMPARE_FILE(
03               LIBRARY1 => 'MYLIB',
04               FILE1 => 'TESTFILE',
05               LIBRARY2 => 'MYLIB',
06               FILE2 => 'TESTFILE2',
07               COMPARE_ATTRIBUTES => 'NO',
08               COMPARE_DATA => 'YES'))

As the two files have a different layout I cannot compare them.

ATTRIBUTE_NAME
-----------------------------------------------------------
COMPARE_DATA not run, reason: Column definition(s) mismatch

Another scenario I tested was to try to compare my DDS file TESTFILE with a SQL table. The easiest way to create the source for the SQL table was to use SQL's Generate SQL procedure:

CALL QSYS2,GENERATE_SQL('TESTFILE','MYLIB','TABLE',
                 'DEVSRC','MYLIB','TESTTABLE',
                 REPLACE_OPTION => '1')

The generated source looks like:

CREATE TABLE MYLIB.TESTTABLE(
--  SQL1506   30   Key or attribute for TESTFILE in MYLIB ignored.
 F001 CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,
 F002 CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,
 F003 DECIMAL(10, 0) NOT NULL DEFAULT 0 )

 RCDFMT TESTFILER  ;

I created the table using the Run SQL Statement command, RUNSQLSTM.

I compared the file to the table:

01  SELECT * 
02    FROM TABLE(QSYS2.COMPARE_FILE(
03               LIBRARY1 => 'MYLIB',
04               FILE1 => 'TESTFILE',
05               LIBRARY2 => 'MYLIB',
06               FILE2 => 'TESTTABLE',
07               COMPARE_ATTRIBUTES => 'YES',
08               COMPARE_DATA => 'NO'))

And received the following message, rather than results from the SQL statement:

SQL State: ERR01 
Vendor Code: -438 
Message: [SQL0438] FILE TYPE OF FILE1 AND FILE2 NOT THE SAME.

So this means I can compare DDS files to one another, SQL tables to one another, but not DDS files to SQL tables.

This post has only scratched the surface of what I am going to be using COMPARE_FILE for. I am sure that you are like me and see numerous scenarios in my work where this is going to be very useful.

 

You can learn more about the changes to SQL's COMPARE_FILE from the IBM website here.

 

This article was written for IBM i 7.4 TR2.

5 comments:

  1. Peterson Bernabe CruspeJune 3, 2020 at 8:18 AM

    I used a similar function created in-house about 14 years ago. Much better now that it's part of the OS. Nice!

    ReplyDelete
  2. Totally agree

    ReplyDelete
  3. Michael Mayer-OakesJune 5, 2020 at 6:52 AM

    That is very cool. If you aren't on 7.3 see the system table QADBIFLD.

    ReplyDelete
  4. Very useful

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.