Pages

Wednesday, August 5, 2020

SELECT with EXCEPT clause to find differences between records in two files

using except clause to join 2 select statements together to see differences

I saw this mentioned in a Facebook discussion where someone said he was using what I will describe below to determine the difference between the records in two files. I had not heard of this method so I wanted to try it out for myself and determine whether it is something I could use in the future. Two SQL Select statements joined with an EXCEPT clause. But before I show examples of that I need data.

I am going to:

  • Create a file
  • Add data to it
  • "Clone" the file to make an exact copying, including the data
  • Change the data in the "clone"
  • Compare the two files

In these examples I am just going to use a file with one field. The same principles work with one field as it would with a hundred fields. I decided to K.I.S.S. and just have one field in my file. The layout for the file is:

A          R FILE0R
A            FIRST_NAME    20A

I created the file:

CRTPF FILE(MYLIB/FILE0) SRCFILE(*LIBL/DEVSRC) SIZE(*NOMAX)

Inserted data into the file using an SQL Insert statement:

INSERT INTO FILE0 VALUES('ANDREA'),('BETTY'),
                        ('CAROLE'),('DOROTHY'),
                        ('EVELYN'),('FRANCESCA'),
                        ('GEORGINA')

I can then view the contents of my file, including the file's internal RRN, Relative Record Number, so I can tell which records I am doing things to.

01  SELECT RRN(A) AS RRN,
02         A.FIRST_NAME
03  FROM FILE0 A 


RRN  FIRST_NAME
---  ----------
  1  ANDREA
  2  BETTY
  3  CAROLE
  4  DOROTHY
  5  EVELYN
  6  FRANCESCA
  7  GEORGINA

Line 1: I have to qualify the column names from the file so that I can use the qualifying character, or name, with the RRN function as shown.

I am going to "clone" the file using the Create Duplicate Object command, CRTDUPOBJ:

CRTDUPOBJ OBJ(FILE0) FROMLIB(*LIBL) OBJTYPE(*FILE) NEWOBJ(FILE1)
            DATA(*YES) CST(*NO) TRG(*NO)

I am not going to show the contents of the new file, FILE1, as it contains the same data the original file, FILE0, does.

Now to make some changes to the data in the new file:

01  DELETE FROM FILE1 WHERE FIRST_NAME = 'DOROTHY'

02  INSERT INTO FILE1 VALUES('HARRIET')

03  UPDATE FILE1 SET FIRST_NAME = 'BARBARA' WHERE FIRST_NAME = 'BETTY'

Now I am going to show you the data in the new file as now it is different from the original file:

01  SELECT RRN(B) AS RRN,
02         B.FIRST_NAME
03  FROM FILE1 B

RRN  FIRST_NAME
---  ----------
  1  ANDREA
  2  BARBARA
  3  CAROLE
  5  EVELYN
  6  FRANCESCA
  7  GEORGINA 
  8  HARRIET

As I now have two files I can compare them.

As I said before all I need to do is to combine two SQL Select statements with the EXCEPT clause. In this example I want to know what is in FILE0 and not in FILE1. To do this the first SQL Select statement has to be for FILE0 and the second for FILE1:

01  SELECT RRN(A) AS RRN,A.*
02    FROM FILE0 A
03  EXCEPT
04  SELECT RRN(B),B.*
05    FROM FILE1 B

The results show that there are two differences:

RRN  FIRST_NAME
---  ----------
  4  DOROTHY
  2  BETTY

Dorothy's record is in the results as I deleted it from FILE1.

Betty's record is shown as I changed the name to "Barbara" in FILE1.

Now let me do the comparison the other way around:

01  SELECT RRN(B) AS RRN,B.* 
02    FROM FILE1 B
03  EXCEPT
04  SELECT RRN(A),A.*
05    FROM FILE0 A

These are the results:

RRN  FIRST_NAME
---  ----------
  8  HARRIET
  2  BARBARA

Harriet's record is shown as I added it to FILE1, therefore, it is not present in FILE0.

Barbara's record is displayed as I changed Betty's name "Barbara", therefore, it is displayed.

Having deleted Dorothy's record above, what would happen if I added a record for Dorothy into FILE1 again?

INSERT INTO FILE1 VALUES('DOROTHY')

As this recorded was inserted it will have a new relative record number, not the same one that the original Dorothy record had:

RRN  FIRST_NAME
---  ----------
  1  ANDREA
  2  BARBARA
  3  CAROLE
  5  EVELYN
  6  FRANCESCA
  7  GEORGINA 
  8  HARRIET
  9  DOROTHY

Let me run the comparison SQL statement again listing the differences in FILE1 when compared to FILE0.

01  SELECT RRN(B) AS RRN,B.* 
02    FROM FILE1 B
03  EXCEPT
04  SELECT RRN(A),A.*
05    FROM FILE0 A

RRN  FIRST_NAME
---  ----------
  8  HARRIET
  2  BARBARA
  9  DOROTHY

The following SQL Selects with a UNION clause show that the two Dorothy records have different relative record numbers and this makes them different:

01  SELECT 'FILE0' AS FILE,
02          RRN(A) AS RRN,
03          A.FIRST_NAME
04    FROM FILE0 A
05   WHERE A.FIRST_NAME = 'DOROTHY'
06   UNION
07  SELECT 'FILE1',RRN(B),B.FIRST_NAME
08    FROM FILE1 B 
09   WHERE B.FIRST_NAME = 'DOROTHY'

FILE   RRN  FIRST_NAME
-----  ---  ----------
FILE0    4  DOROTHY
FILE1    9  DOROTHY

This experiment has shown me that I can use this to determine if there are differences in the records in two files. For example: are the records in the file in "production" the same as the one in the "test" environment?

 

You can learn more about the EXCEPT clause from the IBM website here.

 

This article was written for IBM i 7.4, and should work for some earlier releases too.

5 comments:

  1. This is on 7.4 only, right?

    ReplyDelete
    Replies
    1. No. Runs on 7.3 for sure as I tested it at that release.
      I believe it will run on 7.2 too.

      Delete
  2. Hi Simon,
    Just to understand. If you have same data with different RRN in both files, then data will be identified as different. but actually it is same. Is that so.

    ReplyDelete
    Replies
    1. You are correct.
      If you use 7.4 and have the latest TRs this would be a better approach.

      Delete
  3. Nice example.
    As you said, very useful to compare Test and Production tables.

    Other Suggestion:
    I use very often EXCEPTION JOIN to find integrity errors. F.e. missing records in a Mandatory - Optional relation where records exist in the child table but are missing in the parent while they should be there.

    Motivation:
    Very often databases don't implement Referential integrity. Which is the case in many (old) IBM i environments.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.