Wednesday, March 28, 2018

Processing simple XML using XML-INTO

xml using rpg xml-into

I had received notification from a supplier that they were changing the order file they send my employer. Previously they sent a Microsoft Excel spreadsheet, starting the end of this month it would now be a XML file. In the past if I need to convert a XML into data in an IBM i file I would use the EDI application to do the conversion. As the XML is so simple I decided to process its contents in my own RPG program.

The operation code XML-INTO takes information from elements in the XML document and, in my example, places them into data structure subfields. I believe that this operation code has been around since V5R4, I am just a late comer to using it.

The XML document I will be using in these (very) simple examples contains a (very short) list of names and the city and state the person is in. The proposed format of the XML file the supplier sending me is very simple too, no need for anything complicated.

For illustrative, and easy testing, the first two example programs I will show contain the XML code within them, as declared constants, the others use a file in the IFS.

My first example does everything is just 20 lines of RPG code.

01  *free
02  dcl-c XML1 '<People>+
03                <Person>+
04                  <Name>Simon Hutchinson</Name>+
05                  <City>Los Angeles</City>+
06                  <State>CA</State>+
07                </Person>+
08                <Person>+
09                  <Name>Donald Trump</Name>+
10                  <City>Washington</City>+
11                  <State>DC</State>+
12               </Person>+
13              </People>' ;

14  dcl-ds Person dim(10) qualified ;
15    Name varchar(50) ;
16    City varchar(30) ;
17    State char(2) ;
18  end-ds ;

19  xml-into Person %xml(XML1:'case=any') ;
20  *inlr = *on ; 

Lines 2 – 13: This is my XML document. As I code in totally free RPG I use the Declare Constant, DCL-C to declare the XML document. To make it easier to see all the elements I have separated them all out onto individual lines.

Lines 2 and 13: The XML document is called Person. The document starts with the element on line 2, and ends with the matching closing tag on line 13.

Lines 3 and 8: Each "record" is identified by the Person elements. The first "record" starts on line 3 and is closed on line 7. There is a second "record" starting on line 8 and ending on line 12.

Lines 4 – 6: Within each "record" there are three "fields", each with matching opening and closing tags.

  1. Lines 4 and 9: Person's name
  2. Lines 5 and 10: City
  3. Lines 6 and 11: State

Lines 14 – 18: This is the data structure array into which the XML data will be copied. It must have the same name as the "record", Person, and the data structure subfields must have the same names as the "fields", but they do not have to be in the same order.

Line 19: This is where the magic happens. In this example the XML-INTO operation code is formatted thus:

  xml-into data_structure_array %xml(constant_name : 
                                            'options') ;

The data structure is Person and the constant is XML1. I have one option, case=any, this informs the XML-INTO that the XML element tags can be in any case, and lower and upper case version should be considered the same. If I had omitted this then the default is that the elements matched to the data structure have to be lower case only. In my example the element names contain both upper and lower case.

After compiling this program, I put a debug breakpoint at line 20. When I run it and I get to that breakpoint I can see that the data structure has been loaded with the contents from the XML "file".

EVAL Person
PERSON.NAME(1) = 'Simon Hutchinson
PERSON.CITY(1) = 'Los Angeles
PERSON.STATE(1) = 'CA'
PERSON.NAME(2) = 'Donald Trump
PERSON.CITY(2) = 'Washington
PERSON.STATE(2) = 'DC' 

But the XML I have is a little more complicated, it has what are called either extended elements or subelements, elements within elements. In this example there are two elements within the Name element: First and Last.

01  dcl-c XML2 '<People>+
02               <Person>+
03                 <Name>+
04                   <First>Simon</First>+
05                   <Last>Hutchinson</Last>+
06                 </Name>+
07                 <City>Los Angeles</City>+
08                 <State>CA</State>+
09               </Person>+
10               <Person>+
11                 <Name>+
12                   <First>Donald</First>+
13                   <Last>Trump</Last>+
14                 </Name>+
15                 <City>Washington</City>+
16                 <State>DC</State>+
17               </Person>+
18              </People>' ;

19  dcl-ds Person dim(10) qualified ;
20    dcl-ds Name ;
21      First varchar(20) ;
22      Last varchar(30) ;
23    end-ds ;
24    City varchar(30) ;
26    State char(2) ;
27  end-ds ;

28  xml-into Person %xml(XML2:'case=any') ; 

You can see the subelements on lines 3 – 6 and 10 – 14. To be able to accommodate subelements I need to have a data structure within my data structure array. As I have the current Technical Refresh for IBM i 7.3 I can use a nested data structure, lines 20 - 23.

If I was on an older release or Technical Refresh I would need to do:

19  dcl-ds Person dim(10) qualified ;
20    Name likeds(NameTemplate) ;
21    City varchar(30) ;
22   State char(2) ;
23  end-ds ;

24  dcl-ds NameTemplate template ;
25    First varchar(20) ;
26    Last varchar(30) ;
27  end-ds ; 

Line 20: The LIKEDS(NAMETEMPLATE) inserts a data structure that is like NameTemplate.

Lines 24: The TEMPLATE means that this data structure is a template that can only be used for definitions and cannot be used for containing data. I use this on data structures so it is obvious to other programmers, or myself in the future, that this data structure is not used in the calculations part of the program.

With a debug breakpoint at the end of the program I can see that the subelements have been filled correctly.

> EVAL Person
  PERSON.NAME.FIRST(1) = 'Simon
  PERSON.NAME.LAST(1) = 'Hutchinson
  PERSON.CITY(1) = 'Los Angeles
  PERSON.STATE(1) = 'CA'
  PERSON.NAME.FIRST(2) = 'Donald
  PERSON.NAME.LAST(2) = 'Trump
  PERSON.CITY(2) = 'Washington
  PERSON.STATE(2) = 'DC'

In the following example I will be using a XML file, which is identical to the constant in the previous example:

01  <People>
02    <Person>
03      <Name>
04        <First>Simon</First>
05        <Last>Hutchinson</Last>
06      </Name>
07      <City>Los Angeles</City>
08      <State>CA</State>
09    </Person>
10    <Person>
11      <Name>
12        <First>Donald</First>
13        <Last>Trump</Last>
14      </Name>
15      <City>Washington</City>
16      <State>DC</State>
17    </Person>
18  </People>

The RPG example program is a lot smaller without the constant.

01  dcl-ds Person dim(10) qualified ;
02    dcl-ds Name ;
03      First varchar(20) ;
04      Last varchar(30) ;
05    end-ds ;
06    City varchar(30) ;
07    State char(2) ;
08 end-ds ;

09  xml-into Person %xml('/SAMPLE/test.xml':
                         'case=any doc=file') ;

Lines 1 – 8: The data structure array is the same as the previous example.

Line 9: The XML-INTO is slightly different. In the %XML the first parameter is the location of the XML file in the IFS. There is a new option doc=file, this is used to indicate that the first parameter is a file name.

Sometimes elements are prefix with a namespace, a name given to them to ensure uniqueness. Name is such a generic word it could apply to a person or a thing, and both could be in the same XML file. Here I have added a namespace, Test, to the elements of my XML file.

01  <Test:People>
02    <Test:Person>
03      <Test:Name>
04        <Test:First>Simon</Test:First>
05        <Test:Last>Hutchinson</Test:Last>
06      </Test:Name>
07      <Test:City>Los Angeles</Test:City>
08      <Test:State>CA</Test:State>
09    </Test:Person>
10    <Test:Person>
11      <Test:Name>
12        <Test:First>Donald</Test:First>
13        <Test:Last>Trump</Test:Last>
14      </Test:Name>
15      <Test:City>Washington</Test:City>
16      <Test:State>DC</Test:State>
17    </Test:Person>
18  </Test:People>

There are two ways I can handle the namespace.

I can use the option ns (namespace) to remove the namespace when I move the data from the XML file to the data structure. By doing this I do not have to change the data structure Person.

xml-into Person %xml('/SAMPLE/test.xml':
                     'case=any doc=file ns=remove')

If name occurs more than once I might want to preserve the namespace. When using this subfield names needs to be changed. The subfield names need to be qualified by the namespace followed by an underscore character ( _ ). Now my data structure would need to be.

01  dcl-ds Test_Person dim(10) qualified ;
02    dcl-ds Test_Name ;
03      Test_First varchar(20) ;
04      Test_Last varchar(30) ;
05    end-ds ;
06    Test_City varchar(30) ;
07    Test_State char(2) ;
08  end-ds ;

09  xml-into Test_Person %xml('/SAMPLE/test.xml':
                              'case=any doc=file ns=merge') ;

If I wanted to know the number of XML "records" I have retrieved from the document there is a place in the program data structure that will return the number after the XML-INTO has loaded the data structure array. It is a 20 long integer subfield that starts at position 372.

dcl-ds PgmDs psds qualified ;
  Count int(20) pos(372) ;
end-ds ;

There are times when elements are omitted, sometimes on purpose and other times by accident. The count in the program data structure is a count of "records" not elements. Fortunately there is an option I can use to provide a "count", 1 or 0, if the element is present.

In this XML file the third "record", lines 19 – 25, is missing a State element.

01  <People>
02    <Person>
03      <Name>
04        <First>Simon</First>
05        <Last>Hutchinson</Last>
06      </Name>
07      <City>Los Angeles</City>
08      <State>CA</State>
09    </Person>
10    <Person>
11      <Name>
12        <First>Donald</First>
13        <Last>Trump</Last>
14      </Name>
15      <City>Washington</City>
16      <State>DC</State>
17    </Person>
19    <Person>
20      <Name>
21        <First>Mickey</First>
22        <Last>Mouse</Last>
23      </Name>
24      <City>Hollywood</City>
25    </Person>
26  </People>

To use the count I must add subfields to contain the count. I have added subfields at lines 6, 8 and 10 for the elements Name, City, and State. They must be defined as 5 long integer subfields, and I will explain later why they must all start with the word Count.

01  dcl-ds Person dim(10) qualified ;
02    dcl-ds Name ;
03      First varchar(20) ;
04      Last varchar(30) ;
05    end-ds ;
06    CountName int(5) ;
07    City varchar(30) ;
08    CountCity int(5) ;
09    State char(2) ;
10    CountState int(5) ;
11  end-ds ;

12  dcl-ds PgmDs psds qualified ;
13    Count int(20) pos(372) ;
14  end-ds ;

15  dcl-s i int(5) ;
16  dcl-s TotalName int(10) ;
17  dcl-s TotalCity int(10) ;
18  dcl-s TotalState int(10) ;

19  xml-into Person 
      %xml('/SAMPLE/test.xml':
           'case=any doc=file countprefix=count') ;

20  for i = 1 to PgmDs.Count ;
21    TotalName += Person(i).CountName ;
22    TotalCity += Person(i).CountCity ;
23    TotalState += Person(i).CountState ;
24  endfor ;

Lines 12 – 14: Is the count subfield in the program data structure.

Lines 15 – 18: I will be using these variables for the total counts.

Line 19: There is an additional option in this statement, countprefix=count. This means that any count subfield must be prefixed with the word Count and then the element name, see the data structure lines 6, 8, and 10.

Lines 20 – 24: I cannot XFOOT the count subfields within the data structure array, therefore I am using this For group to total the count subfields. If all elements are present then the totals will equal the program data structure count.

After compiling this program I put debug breakpoints at lines 19 and at the end of the program, after line 23.

At the breakpoint on line 19 I can see the data in the Person data structure array. In the first element I have 1 in all the count subfields. In the third element as the State element was missing from that "record" the CountState subfield contains 0.

EVAL Person
PERSON.NAME.FIRST(1) = 'Simon
PERSON.NAME.LAST(1) = 'Hutchinson
PERSON.COUNTNAME(1) = 1
PERSON.CITY(1) = 'Los Angeles
PERSON.COUNTCITY(1) = 1
PERSON.STATE(1) = 'CA'
PERSON.COUNTSTATE(1) = 1


PERSON.NAME.FIRST(3) = 'Mickey
PERSON.NAME.LAST(3) = 'Mouse
PERSON.COUNTNAME(3) = 1
PERSON.CITY(3) = 'Hollywood
PERSON.COUNTCITY(3) = 1
PERSON.STATE(3) = '  '
PERSON.COUNTSTATE(3) = 0

At the breakpoint at the end of the program I can see the following values.

PGMDS.COUNT = 3
TOTALNAME = 3
TOTALCITY = 3
TOTALSTATE = 2

This is as I expect as there were only two State elements in the XML file.

I know that this is getting to be a long post, but there is one more thing I want to write about.

Some XML files contain headers, if my file has one and I am not interested in the information contained within I want a way I can skip straight to the place in the XML file I want getting the information from. Here is part of a XML document with a Header element followed by the Person elements. In this limited example the Header only contains one element, Description, but it could contain many elements all of which I do not care for.

01  <People>
02    <Header>
03      <Description>Header for this test XML file</Description>
04    </Header>
05    <Person>
06      <Name>
07        <First>Simon</First>
08        <Last>Hutchinson</Last>
09      </Name>

Fortunately there is an option for this.

xml-into Person %xml('/SAMPLE/test.xml':
         'case=any doc=file path=People/Person') ;

The path option tells XML-INTO to skip to the first place where the element People and subelement Person occur, which is line 5 in the example file.

There is whole lot more you can do with XML-INTO. Fortunately the XML document was very simple, therefore, my program is also simple. There is whole lot more you can do with this operation code as there are many more options than I have mentioned, and I can use a handler procedure, %HANDLER, if needed too.

 

You can learn more about this from the IBM website:

 

This article was written for IBM i 7.3, and should work for earlier releases too.

9 comments:

  1. excellent examples, refer to this website on daily basis

    ReplyDelete
    Replies
    1. Thank you for compliment. It is good to know people find this site helpful.

      Delete
  2. This is a great article, as said this site is really helpful

    ReplyDelete
  3. Nice Article

    ReplyDelete
  4. Is there restriction of length of variable when declaring the path of xml??

    ReplyDelete
    Replies
    1. I could make a variable of over 32k long. I think the length of the folder, subfolder, file names in the IFS is more likely to be the limiting factor.

      Delete
  5. Excellent and very helpful. !!!

    ReplyDelete
  6. excellent article

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.