Tuesday, June 13, 2017

Using SQL to load data into a test instance

using 3 part sql path to update test files

Today's post is written by David Taylor.

From time to time, we need to load data from a production system to a development system. To make the process more complex, the data often resides in more than one file. Using SQL, we finally found a way to load the data into multiple files from the same cursor. We include in the SELECT clause any fields from any of the files in the data set to refine the cursor. Once we have the right data set, we can use the final version for the multiple file loads. Keep in mind, we need the data from all three files for the process test to work correctly. We need not just any set of records; we need the data that matches the relationships and limits defined in the cursor. In the samples each file ends with a letter. That same letter is the first character for the field names. For example, FILEA has fields like AKEYFLD1, AFIELD1, and so forth.

01  SELECT AKEYFLD1, BKEYFLD1, CKEYFLD1, AVALUE, CDATE
02   FROM PRODSERV.MYLIBRARY.FILEA
03        JOIN PRODSERV.MYLIBRARY.FILEB
04          ON AFKEYFLD1 = BFKEYFLD1
05        JOIN PRODSERV.MYLIBRARY.FILEC
06          ON BFKEYFLD1 = CFKEYFLD1
07  WHERE FILEA.AVALUE = 'VALUE'
08    AND DATE(FILEC.CDATE) = '2017-05-30'

Line 1: List all the fields you need to determine the desired data set.

Line 2 - 6: List the file names needed using the fields required to properly join the files together. The example uses the full data file name from the production system where PRODSERV is the name of the target system in the source system directory allowing you to pull directly to one system from another.

Line 7 - 8: List the filters for the data set. Use as many as needed and as few as possible.

Assuming the target library on the development system is MYTSTLIB and all data files are in the same library, make very few changes to use the SQL to load the cursor data into each file.

01  INSERT INTO MYTSTLIB/FILEA
02  SELECT FILEA.* 
03    FROM PRODSERV.MYLIBRARY.FILEA 
04         JOIN PRODSERV.MYLIBRARY.FILEB 
05           ON AKEYFLD1 = BKEYFLD1 
06         JOIN PRODSERV.MYLIBRARY.FILEC 
07           ON BKEYFLD1 = CKEYFLD1
08   WHERE FILEA.AVALUE = 'VALUE' 
09     AND DATE(FILEC.CDATE) = '2017-05-30' 
10   FETCH FIRST 100 ROWS ONLY

Line 1: Define the target file and library on the development system. Yes, you could trust the library list and leave off the library, but better safe than sorry.

Line 2: Pull all the fields from the source file for each current target. This is the only line from the original statement that changes for each of the three files.

Line 3 - 9: This block is the same as the original SQL and will remain the same for all loads.

Line 10: To keep production data from overloading the development system, you may limit to XX rows, in this case 100. The XX rows selected are the rows from the cursor, not the rows from the files. Pulling the first XX rows from each file would not accomplish the need to have the correct data relationships.

Simply repeat the process for the other two files by changing the file name in the INSERT clause and the SELECT clause.

Note well: this method works great for tables without auto-generated columns or other fields such as date fields. Auto-generated fields can include an identity column or a system-generated timestamp. You can still use the basic concept. You simply cannot use the splat (otherwise known as: asterisk or * ) option to select all fields. You would need to list all non-generated fields in the INSERT and SELECT clauses. This will, of course, result in the auto-generated fields having new values. The INSERT and SELECT clauses might look like this.

01  INSERT INTO MYTSTLIB/FILEA 
01.1       (AKEYFLD1, AFIELD2, AFIELD3, AFIELD4)  
02  SELECT AKEYFLD1, AFIELD2, AFIELD3, AFIELD4
03    FROM PRODSERV.MYLIBRARY.FILEA 
04         JOIN PRODSERV.MYLIBRARY.FILEB 
05           ON AKEYFLD1 = BKEYFLD1 
06         JOIN PRODSERV.MYLIBRARY.FILEC 
07           ON BKEYFLD1 = CKEYFLD1
08   WHERE FILEA.AVALUE = 'VALUE' 
09     AND DATE(FILEC.CDATE) = '2017-05-30' 
10   FETCH FIRST 100 ROWS ONLY

Line 1.1: From the sample above, add the list of fields to insert.

Line 2: From the sample above list the fields to select.

Once you have this idea down, you can save needing to use SNDNETF or other ways to move data from production to development. You can read more about identity columns here.

 

This article was written for IBM i 7.1 TR10, and should work for later releases too.

10 comments:

  1. So this is a simple ETL application? This must be a federated database to be able to work with 2 systems in the same SQL statement. What about the Transform of sensitive data, like SSN?

    ReplyDelete
    Replies
    1. This is straight SQL not ETL. STRSQL and away you go. If you have HIPAA, PCI, or PII, you would need to resolve that according to your company's policy. You could scramble the data on the fly.

      Delete
  2. For tables with Identity Columns, look into "OVERRIDING SYSTEM VALUE". This tells the system to allow the insert into the Column. (This is needed if you have other Tables that rely on this key value.)
    When insert is done. Do a Max on that column.
    Then do:
    Alter Table TblName Alter Column ColName Restart With Max+1

    ReplyDelete
  3. You should use "EXISTS" instead of JOIN if you don't want duplicate records

    ReplyDelete
  4. Good idea. Some might suggest creating SQL views rather than using inline SELECTs. If the statements need to be run on a regular basis, another idea is to place them in source files that may be run with RUNSQLSTM.

    ReplyDelete
  5. Executing a remote SQL Statements that point to another IBM box implied the use of what is known as three part sql where the production table is pointed out using ProductionBOXNAME.PRODUCTIONLIBRARY.PRODUCTIONFILENAME however in order for this to work user should be aware of this: On the local system, you need to add an entry for the remote IBM box in the relational database directory entry which can be done using WRKRDBDIRE or ADDDRBDIRE.

    ReplyDelete
  6. Please help me out here...

    How does this statement 'load the data into multiple files from the same cursor'?

    How does 'Insert into FILEA...' populate FILEB or FILEC?

    ReplyDelete
  7. Jim, for this example, you need to run the insert three times. For each iteration, you change the INSERT INTO FILEX to the next file and the SELECT FILEX.* to the next file.

    ReplyDelete
  8. Thank you David! That makes sense.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.