Wednesday, January 27, 2016

Read an IFS file using RPG

read ifs rpg

 

Update

Before using what is described below consider using SQL to read IFS files.

 


 

I had a situation where I had a text file in an IFS folder and I needed to read its contents. I could have copied the file from the IFS into an IBM i library and then read it there, but I decided to investigate how I could read the file directly from the IFS using RPG. Fortunately there are three APIs which will allow me to do this.

First I need to give credit to a friend of mine, who wishes to remain anonymous, for providing a program that my example is based upon.

I created a text file using Windows' Notepad. At the end of each line I pressed Enter to create variable length records, see below, and copied the file to my folder in the IFS.

first record At the start
second record
third record
fourth record
fifth record
sixth record
seventh record
eighth record
ninth record
tenth and last record

So now to the interesting part… the RPG program. There are three APIs I am going to use in my program:

  • open – Which is used to open the IFS file
  • read - Reads the file
  • close - Closes the file

These are UNIX-type APIs, therefore, the parameters and the way they are passed needs to be translated into RPG. Let me start with the procedure definitions:

01  ctl-opt dftactgrp(*no) ;

02  dcl-pr OpenFile int(10) extproc('open') ;
03    *n pointer value options(*string) ;
04    *n int(10) value ;
05    *n uns(10) value options(*nopass) ;
06    *n uns(10) value options(*nopass) ;
07    *n uns(10) value options(*nopass) ;
08   end-pr ;

09   dcl-pr ReadFile int(10) extproc('read') ;
10     *n int(10) value ;
11     *n pointer value ;
12     *n uns(10) value ;
13   end-pr ;

14   dcl-pr CloseFile int(10) extproc('close') ;
15     *n  int(10) value ;
16   end-pr ;

Before I start describing what each line does I thought it would be useful to describe the functions of some of the keywords seen on more than one line:

  • EXTPROC( ) - As I have decided not to use the external procedures' names this is the keyword that maps the name I have chosen for the procedure to its external name.
  • *N - I do not like giving the parameters for the procedures. In free format definitions if I do not give a parameter a name I need to use *N to denote that this parameter does not have a name.
  • VALUE - This means that the called procedure will use a copy of the data passed, and the original version of the data will not be changed.
  • OPTIONS(*STRING) - Is the equivalent of a null terminated string in C.
  • OPTIONS(*NOPASS) - Denotes that this is an optional parameter.

And now to a more detailed explanation of the code.

Line 1: OK, this is not a procedure definition, but without this control option the program will not compile.

Lines 2 – 8: This is the definition for the API to open the IFS files, open. I have chosen to use the name OpenFile to avoid confusion with the OPEN operation code. This procedure has five parameters:

  1. The path where the file is to be found in the IFS (pointer).
  2. File status flags and access mode for the file when it is opened (integer).
  3. File permission bytes (unsigned integer).
  4. Output CCSSID (unsigned integer).
  5. Not used (unsigned integer).

Lines 9 – 13: This is the definition for the API to read the IFS file, read. As with the previous API definition I have decided to use the name ReadnFile to avoid confusion with the READ operation code. This procedure has three parameters:

  1. File descriptor, the file to be read which is a value returned from the open procedure (integer).
  2. Variable into which the bytes read are placed in (pointer).
  3. The number of bytes retrieved. This can be less than the length of the variable when the last read of the file is performed.

Lines 14 – 16: This is the definition to close the IFS file, close. I have decided to call it CloseFile. This procedure have one parameter:

  1. File descriptor, the file to be closed procedure (integer).

These are the definitions for the constants and standalone variables:

17  dcl-c O_RDONLY 1 ;           //Read only
18  dcl-c O_TEXTDATA 16777216 ;  //Open in text mode
19  dcl-c O_CCSID 32 ;           //CCSID
20  dcl-c S_IRGRP 32 ;           //Group authority

21  dcl-s fd int(10) ;
22  dcl-s Length int(10) ;
23  dcl-s Data char(100) ;
24  dcl-s Array char(100) dim(20) ;
25  dcl-s Element packed(3) ;
26  dcl-s Start packed(3) ;
27  dcl-s End like(Start) ;

28  dcl-c Path '/SIMON/test_read.txt' ;

Lines 17 – 19: These are the parameters that are used by the OpenFile procedure for the file status and access modes flags, in the documentation they are called the oflag. I will show how they come together when I call this procedure.

Line 20: This constant is used by OpenFile for read permission for the file.

Line 21: This is the file descriptor.

Lines 22 – 23: Used when the file is read. Data is the input buffer/variable that the data from the file is read into. Length is the number of bytes retrieved by the read.

Lines 24 – 24: I am going to output the individual records from the file into elements of an array.

Lines 26 – 27: Used to extract the individual records from the input buffer.

Line 28: The path name to the file.

Having defined everything now I can open the file:

29  fd = OpenFile(Path :
                  O_RDONLY + O_TEXTDATA + O_CCSID :
                  S_IRGRP :
                  37) ;

30  if (fd < 0) ;
31    dsply ('IFS file ' + path + ' could not be opened') ;
32    *inlr = *on ;
33    return ;
34  endif ;

Line 29: I don't normally show procedure calls broken out over more than one line, but I did this time so you could clearly see how the three file status and access code flags are combined into one variable. The last parameter, 37 is the CCSID used by the IBM i I used to create this example program. The procedure returns the file descriptor, which will be used by the other two procedures.

Lines 30 – 34: If file descriptor is less than zero an error was encountered. I am just displaying a message and quitting the program. If I was using this in a "production program" the error handling would be more complicated.

Having opened the file now I need to read it:

35  dow (1 = 1) ;
36    Length  = ReadFile(fd:%addr(Data):%size(Data)) ;
37    if (Length = 0) ;
38      leave ;
39    elseif (Length < %size(Data)) ;
40      %subst(Data:(length + 1)) = ' ' ;
41    endif ;

42    Start = 0 ;

43    dow (2 = 2) ;
44      Element += 1 ;
45      End = %scan(x'25':Data:(Start + 1)) ;
46      if (End > 0) ;
47        if (Array(Element) = ' ') ;
48          Array(Element) = %subst(Data:(Start + 1):
                                   ((End - Start) - 2)) ;
49        else ;
50          Array(Element) = %trimr(Array(Element)) +
                             %subst(Data:(Start + 1):
                                   ((End - Start) - 2)) ;
51        endif ;
52        Start = End ;
53      else ;
54        Array(Element) = %subst(Data:(Start + 1)) ;
55        Element -= 1 ;
56        leave ;
57      endif ;
58    enddo ;
59  enddo ;

Line 36: The read uses the file descriptor to identify the file. The %addr(Data) returns a value to a pointer that is the address of the variable Data, in other words it places the input buffer from the read into Data. The %size(Data) passes to the procedure the length of the input buffer. I could have enter the numeric value of the length of the variable Data. By using the SIZE( ) built in function if I change the length of Data I do not have to change this line of code. The procedure returns the length of the input buffer returned into the variable Length.

Lines 37 – 38: If nothing was returned from my read then the program quits this do-loop.

After the first read this is what the contents of Data looks like:

      ....5...10...15...20...25...30...35...40...45...50...55...60 
  1   'first record At the start  second record  third record  four'
 61   'th record  fifth record  sixth record  s'

I have managed to retrieve the whole of the first six records, and the first character of the seventh. Each record is separated by two hexadecimal values:

  1. x'0D' - Carriage return
  2. x'25' - Line feed

After the second read Data contains:

       ....5...10...15...20...25...30...35...40...45...50...55...60 
  1   'eventh record  eighth record  ninth record  tenth and last r'
 61   'ecordcord  fifth record  sixth record  s'

I have retrieved the remaining part of the seventh record at the start of Data, and the remaining records. But as these are shorter than the 100 bytes, the size of Data the remaining space is still filled with the data from the previous read.

Lines 39 – 40: The length of the retrieved input buffer, 65 bytes, is less than the size of Data, 100, on the second read. I use a %SUBST built in function to initialize the 66th byte onwards with blanks to "delete" the data from the prior read.

       ....5...10...15...20...25...30...35...40...45...50...55...60 
  1   'eventh record  eighth record  ninth record  tenth and last r'
 61   'ecord                                   '

Lines 43 – 57: As I now have the data from the file's records in Data I now use RPG to break them out into individual fields and write them to the array, Array.

Line 45: I am scanning Data looking for the next occurrence of x'25' (line feed).

Lines 46 – 52: If I find the hexadecimal value in the string, line 46, I check that the current array element is blank. I will explain why later. If it is then I just substring Data for the record, and place it in the array, line 48. If there is already data in the current array element I trim what is in there already and add the next record to it.

Lines 53 – 56: When would I not find x'25'?

  1. At the end of Data after the first read. The first read retrieved part of the seventh record at its end.
  2. After the tenth record after the second read.

Line 54: Moves the remaining data after the less x'25' into the current array element.

Line 55: Subtract one from the count in array element field. The element is incremented before the next read is performed, giving us the same array element we put the partial field in.

Line 56: Leave the second do-loop.

When everything has been read from the file the value returned into Length will be zero and the program leaves the first do-loop, lines 37 – 38.

Line 60: All that is left to do is to close the file using the CloseFile procedure with the file descriptor as the only parameter.

If I use debug and look at the contents of Array at the end of the program I can see who each element contains a record from the file:

EVAL array
ARRAY(1) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'first record At the start                                   '
    61   '                                        '
ARRAY(2) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'second record                                               '
    61   '                                        '
ARRAY(3) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'third record                                                '
    61   '                                        '
ARRAY(4) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'fourth record                                               '
    61   '                                        '
ARRAY(5) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'fifth record                                                '
    61   '                                        '
ARRAY(6) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'sixth record                                                '
    61   '                                        '
ARRAY(7) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'seventh record                                              '
    61   '                                        '
ARRAY(8) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'eighth record                                               '
    61   '                                        '
ARRAY(9) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'ninth record                                                '
    61   '                                        '
ARRAY(10) =
          ....5...10...15...20...25...30...35...40...45...50...55...60 
     1   'tenth and last record                                       '
    61   '                                        '
ARRAY(11) =
        ....5...10...15...20...25...30...35...40...45...50...55...60 
   1   '                                                            '
  61   '                                        '

If the records were all fixed length then I would change the length of Data would be the fixed record length, therefore, each read would give me the next record.

I could have made Data much larger, for example 1,000 bytes, so that all the records would have been included in the first read, but I wanted to show how to perform multiple reads and how to handle a record that would go from the first input buffer to the second.

The entire source code for this program looks like:

01  ctl-opt dftactgrp(*no) ;

02  dcl-pr OpenFile int(10) extproc('open') ;
03    *n pointer value options(*string) ;
04    *n int(10) value ;
05    *n uns(10) value options(*nopass) ;
06    *n uns(10) value options(*nopass) ;
07    *n uns(10) value options(*nopass) ;
08   end-pr ;

09   dcl-pr ReadFile int(10) extproc('read') ;
10     *n int(10) value ;
11     *n pointer value ;
12     *n uns(10) value ;
13   end-pr ;

14   dcl-pr CloseFile int(10) extproc('close') ;
15     *n  int(10) value ;
16   end-pr ;

17  dcl-c O_RDONLY 1 ;           //Read only
18  dcl-c O_TEXTDATA 16777216 ;  //Open in text mode
19  dcl-c O_CCSID 32 ;           //CCSID
20  dcl-c S_IRGRP 32 ;           //Group authority

21  dcl-s fd int(10) ;
22  dcl-s Length int(10) ;
23  dcl-s Data char(100) ;
24  dcl-s Array char(100) dim(20) ;
25  dcl-s Element packed(3) ;
26  dcl-s Start packed(3) ;
27  dcl-s End like(Start) ;

28  dcl-c Path '/SIMON/test_read.txt' ;

29  fd = OpenFile(Path :
                  O_RDONLY + O_TEXTDATA + O_CCSID :
                  S_IRGRP :
                  37) ;

30  if (fd < 0) ;
31    dsply ('IFS file ' + path + ' could not be opened') ;
32    *inlr = *on ;
33    return ;
34  endif ;

35  dow (1 = 1) ;
36    Length  = ReadFile(fd:%addr(Data):%size(Data)) ;
37    if (Length = 0) ;
38      leave ;
39    elseif (Length < %size(Data)) ;
40      %subst(Data:(length + 1)) = ' ' ;
41    endif ;

42    Start = 0 ;

43    dow (2 = 2) ;
44      Element += 1 ;
45      End = %scan(x'25':Data:(Start + 1)) ;
46      if (End > 0) ;
47        if (Array(Element) = ' ') ;
48          Array(Element) = %subst(Data:(Start + 1):
                                   ((End - Start) - 2)) ;
49        else ;
50          Array(Element) = %trimr(Array(Element)) +
                             %subst(Data:(Start + 1):
                                   ((End - Start) - 2)) ;
51        endif ;
52        Start = End ;
53      else ;
54        Array(Element) = %subst(Data:(Start + 1)) ;
55        Element -= 1 ;
56        leave ;
57      endif ;
58    enddo ;
59  enddo ;

60  CloseFile(fd) ;
61  *inlr = *on ;

And for those of you still forced to use fixed format definitions they would look like:

01  H dftactgrp(*no)

02  D OpenFile        PR            10I 0 extproc('open')
03  D                                 *   value options(*string)
04  D                               10I 0 value
05  D                               10U 0 value options(*nopass)
06  D                               10U 0 value options(*nopass)
07  D                               10U 0 value options(*nopass)

08  D ReadFile        PR            10I 0 extproc('read')
09  D                               10I 0 value
10  D                                 *   value
11  D                               10U 0 value

12  D CloseFile       PR            10I 0 extproc('close')
13  D                               10I 0 value

14  D O_RDONLY        C                   1
15  D O_TEXTDATA      C                   16777216
16  D O_CCSID         C                   32
17  D S_IRGRP         C                   32
18  D Path            C                   '/SIMON/test_read.txt'

19  D fd              S             10I 0
20  D Length          S             10I 0
21  D Data            S            100
22  D Array           S            100    dim(20)
23  D Element         S              3P 0
24  D Start           S              3P 0
25  D End             S                   like(Start)
     /free

 

You can learn more about on the IBM website:

 

This article was written for IBM i 7.2, and should work for earlier releases too.

 

Update

After publishing this article I received an email detailing an alternative method of reading an IFS file in RPG, which I consider easier than this way. You can find this article here.

7 comments:

  1. Agreed that it is a great tip. Though I want to take a moment to encourage you to stop using *n in your prototypes. By naming the parameter you document what each parameter does. Or at the very least you can put a comment to the right of each parameter like Nick Litten recently did on his blog:
    http://www.nicklitten.com/blog/upgrade-rpg-call-statement-rpgle-procedure-call

    Programmers that would find this example helpful can't look at those prototypes and tell what parameter 3 of the open command should be for instance. Readability and self documenting code is important.

    ReplyDelete
  2. I agree with the comment by marcsumus, the parameter names should be descriptive. In addition, I generally include the *trim option for the path name.

    If you are creating an IFS stream file and writing to it, the O_TEXT_CREAT option and the fifth parameter are useful, since that allows you to specify a CCSID for the new stream file, and specify a different CCSID for the data you are writing to it.

    Another handy technique when writing data to a stream file is to use varying length fields, and then use %addr(field:*DATA) and %len(field) for the parameters on the write() call.

    ReplyDelete
  3. God job Simon, as always ...
    I agree with Marcsumus ... I always use clear names for parameters in dcl-pr prototypes, I don't like *N way ....

    I use something similar to read IFS and often I use an SQL UDF function stolen from another good post on McPressOnline by Michale Sansoterra (http://www.mcpressonline.com/tips-techniques/sql/techtip-using-sql-with-ifs-text-files-part-2.html).

    ReplyDelete
  4. Thanks Simon, having the ability to read a file imported file in an IFS folder provides additional application design options instead of the IBM copy commands.

    ReplyDelete
  5. Great example, for those who want to know where the constant came from check this copybbok: QSYSINC/QRPGLESRC member FCNTL

    ReplyDelete
  6. Hi Simon,
    I just used this tip and it works great. I did, however, find a situation where the code crashes. It happens when the chunck of data read ends precisely at the CR character. I solved it by checking if the variable Start + 1 exceeds the size of the data read and if it does, LEAVE the inner most loop when reading the text file so the code will read another chunk of data.

    Let me know if this makes sense. Have you come across this situation?

    Thank you and cheers,

    Antonio Mira

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.