Tuesday, June 27, 2023

Copying multiple IFS files into one output file

This us another post that came from a question I was asked by a reader of this blog. The question is:

I have multiple files in an IFS folder with similar names, each has a timestamp at the end of the file name to make them unique. How can I read all the files and write the contents into one file?

Fortunately there are two SQL Table Functions that can help me do this:

  • IFS_OBJECT_STATISTICS:  To make a list of the files
  • IFS_READ:  Read the contents of the files

The rest is pretty straightforward SQL and RPG.

Rather than files with timestamps in them I have three files that have identical files names, except the last character. I can list the files using IFS_OBJECT_STATISTICS:

01  SELECT PATH_NAME 
02   FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS('/home/SIMON1'))
03  WHERE OBJECT_TYPE = '*STMF'                  
04    AND PATH_NAME LIKE '/home/SIMON1/testfile%'
05  ORDER BY PATH_NAME

Line 2: The path name parameter does not have a wild card character. Therefore, I request that all files in the folder are returned in the results.

Line 3: I am using the Where clause to filter only the files that are stream files, these include plain text files.

Line 4: The Where clause does allow a wildcard with the LIKE. This is how I can use it to include all files that start with "testfile".

The results are shown below:

PATH_NAME
---------------------------
/home/SIMON1/testfile_1.txt
/home/SIMON1/testfile_2.txt
/home/SIMON1/testfile_3.txt

I am going to show the contents of these files as I'll need to know what's in the files to ensure that they have all been copied into the output file:

testfile_1.txt
-----------------
File 1, Record 1
File 1, Record 2

testfile_2.txt
----------------
File 2, Record 1
File 2, Record 2

testfile_3.txt
----------------
File 3, Record 1
File 3, Record 2

I created a single RPG program that does all I needed to do. Let me start by showing the "main" part of the program:

01  **free
02  ctl-opt option(*nodebugio:*srcstmt:*nounref) dftactgrp(*no) ;

03  dcl-ds FileData qualified dim(*auto:9999) ;
04    FileName char(100) ;
05  end-ds ;

06  dcl-s Rows int(5) inz(%elem(FileData : *max)) ;

07  exec sql SET OPTION COMMIT = *NONE,CLOSQLCSR = *ENDMOD ;

08  GetListOfFiles() ;
09  GetData() ;

10  *inlr = *on ;

Line 1: In 2023 I hope you are all writing all your new programs in totally free RPG.

Line 2: My favorite control options, and I need the DFTACTGRP(*NO) as my programs contains subprocedures.

Lines 3 – 5: A data structure array I will be using to contains the results from IFS_OBJECT_STATISTICS. I have made this an auto-extending array, as I use it more elements are added until I reach the maximum.

Line 6: Rather than hard code the maximum number of elements in a SQL statement I will use this variable instead.

Line 7: I like to add SQL options to the RPG source, rather than have to remember to enter them when I compile the program.

Line 8 and 9: Within these two subprocedures I will get the list of files and then write their contents to a file.

Next up the subprocedure to get the list of eligible IFS files:

11  dcl-proc GetListOfFiles ;
12    exec sql DECLARE C0 CURSOR FOR
13              SELECT CAST(PATH_NAME AS CHAR(100))
14                FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS('/home/SIMON1'))
15               WHERE OBJECT_TYPE = '*STMF'
16                 AND PATH_NAME LIKE '/home/SIMON1/testfile%'
17               ORDER BY PATH_NAME ;

18    exec sql OPEN C0 ;

19    exec sql FETCH C0 FOR :Rows ROWS INTO :FileData ;

20    exec sql CLOSE C0 ;
21  end-proc ;

Lines 12 – 17: This cursor declaration uses a href="/2019/11/using-sql-to-list-directories-and-files.html" target="_blank" title="Retrieve list of files in IFS">IFS_OBJECT_STATISTICS to get a list of all the files I will be reading. It is the same as the SQL statement I used earlier to list the files.

Line 18: Open the cursor.

Line 19: Fetch up to 9,999 rows from the results into the data structure array FileData.

Line 20: Close the cursor.

Now to the subprocedure that reads the IFS files and inserts into one common output file.

22  dcl-proc GetData ;
23    dcl-ds SingleData likeds(FileData) ;
24    dcl-s NewName char(100) ;
25    dcl-s Statement char (200) ;

26    exec sql DROP TABLE IF EXISTS QTEMP.OUTPUT ;

27    exec sql CREATE TABLE QTEMP.OUTPUT
28                 (RECORD) AS
29                 (SELECT CAST(LINE AS CHAR(200))
30                    FROM TABLE(QSYS2.IFS_READ(
31                           '/home/SIMON1/dummy.txt')))
32                 DEFINITION ONLY ;

33    for-each SingleData in FileData ;
34      Statement = 'INSERT INTO QTEMP.OUTPUT +
35                   (SELECT LINE +
36                      FROM TABLE(QSYS2.IFS_READ(+
37                      PATH_NAME => ''' +
38                        %trim(SingleData.FileName) + ''')))' ;

39      exec sql EXECUTE IMMEDIATE :Statement ;

40      if (SQLCOD = 0) ;
41        NewName = 'X_' + %subst(SingleData.FileName:14) ;

42        Statement = 'RNM OBJ(''' + %trimr(SingleData.FileName) + ''') +
43                          NEWOBJ(''' + %trimr(NewName) + ''')' ;

44        exec sql CALL QSYS2.QCMDEXC(:Statement) ;
45      endif ;
46    endfor ;

47    exec sql CLOSE C0 ;
48 end-proc ;

Line 23: I am defining a new data structure, that is not an array, based on the data structure array FileData.

Lines 24 and 25: Defining variables that are local to this subprocedure.

Line 26: If the output file already exists in QTEMP this statement will delete it. The IF EXISTS part of the statement ensures that if the file is not found the Drop statement does not error.

Lines 27 – 32: I am creating the output file "on the fly".

Line 28: I have to give the new Table's only column a name as due to what happens in the Select statement.

Line 29: I have the CAST column LINE to a size of 200 characters, even though the contents of the IFS files are less than that.

Line 31: I have a file in the IFS, dummy.txt, that contains just one line of data. Like IBM's SYSDUMMY1 file. I am using Create Table to make the file "on the fly", based on the output from the IFS_READ Table Function. As the statement ends with DEFINITION ONLY the Table is created, and no data from dummy.txt is inserted into it.

Lines 33 - 46: This For group uses the FOR-EACH operation code to read all the elements of the array FileData. As this is an auto-extending it only has the number of elements as results I fetched.

Line 33: Each time the FOR-EACH is executed it reads the next element of the data structure array FileData into the data structure SingleData.

Lines 34 - 38: I tried to have an Insert statement with a variable name for the path name, I just could not get the statement to work. The only way I could was not to hard code the path name. To emulate the hard coding I create the Insert statement in a variable.

Line 39: I then use the Execute Immediate statement to execute the SQL statement contained within the variable.

Lines 40 – 45: This was not part of the original question, but was something I would include. When a file has been processed I would want it renamed. This way the next time this was run I only download the contents of any new files.

line 40: If there are no errors or warning then the SQL code will be zero.

Line 41: I need to make a new file name. I am adding 'X_' to the start of the file's name. I only need the name of the file, not its path, which is why the substring statement starts at position 14. Positions 1 – 13 contain '/home/SIMON1/'.

Lines 42 and 43: I am making my RNM command statement. This will rename the file in the IFS. I need the full path name in the OBJ parameter, but only the file's new name in the NEWOBJ.

Line 44: Here I use the QCMDEXC procedure to execute the CL command I just built.

After compiling this program I called it, and the file OUTPUT was created in QTEMP. I could then use the following SQL statement to look at its contents:

01  SELECT * FROM QTEMP.OUTPUT

I see that all the records from the IFS files have been copied into the output file.

LINE
----------------
File 1, Record 1
File 1, Record 2
File 2, Record 1
File 2, Record 2
File 3, Record 1
File 3, Record 2

And by using this statement I can check if the files were renamed:

01  SELECT PATH_NAME 
02   FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS('/home/SIMON1'))
03  WHERE OBJECT_TYPE = '*STMF'                  
04    AND PATH_NAME LIKE '/home/SIMON1/X_%'
05  ORDER BY PATH_NAME

The results show that they were:

PATH_NAME
-----------------------------
/home/SIMON1/X_testfile_1.txt
/home/SIMON1/X_testfile_2.txt
/home/SIMON1/X_testfile_3.txt

If you are going to do this for your situation you will need to check that the sizes I cast the columns in the SQL statements will contain your data. You'll also need to check that the size of the path and file name RPG variables are big enough.

 

This article was written for IBM i 7.5 TR1 and 7.4 TR7.

1 comment:

  1. Great example. Thank yo for this. I think better than renaming the processed file would be to move the file to an Archive folder.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.