Wednesday, September 6, 2017

REGEXP for count

regexe expression count used for counting characters in a string

With the latest IBM i Technical Refreshes to 7.2, TR6, and 7.3, TR2, a number regular expressions functions were added to Db2 for i (SQL). In this post I am going to describe the REGEXP_COUNT function.

Regular readers of this blog will know that I have already written about the REGEXP_LIKE. Rather than repeat a lot of the things I wrote about in the post I am just going to refer you to it at various time in this one.

The format for this function is as follows:

  REGEXP_COUNT(source-string, search-pattern, start-position, 
               regexp-flags)

Only the first two parameters: source-string and search-pattern are mandatory, the other two are optional.

Let me start with some simple examples. In the example below I want to know how many times the search pattern 'rpg' appears in the search string.

SELECT 
  REGEXP_COUNT('RPGPGM.COM is the best website for rpg examples',
               'rpg')
  FROM SYSIBM.SYSDUMMY1

I have to use the dummy file SYSIBM.SYSDUMMY1 as this is a Select statement where the source string is not a row/field in a table/file.

My result is:

  REGEXP_COUNT
             1

It is only 1 as I searched for the lower case pattern 'rpg' that only occurs once in the source string. If I wanted my result to ignore case I need the 'i' in the regexp flags. For more example of ignoring case see the post about the REGEXP_LIKE.

SELECT
  REGEXP_COUNT('RPGPGM.COM is the best website for rpg examples',
               'rpg','i')
  FROM SYSIBM.SYSDUMMY1

Now I get the count of both occurrences of 'rpg'.

REGEXP_COUNT
           2

Notice that I used only the first, second, and fourth parameters, I did not have to give a value for the starting position parameter. Db2 is smart enough to know I had omitted it. If I was to include it the statement would look like.

SELECT
  REGEXP_COUNT('RPGPGM.COM is the best website for rpg examples',
               'rpg',1,'i')
  FROM SYSIBM.SYSDUMMY1

I can use the start position to start my search for the search pattern in the second position:

SELECT
  REGEXP_COUNT('RPGPGM.COM is the best website for rpg examples',
               'rpg',2,'i')
  FROM SYSIBM.SYSDUMMY1

I only return a count of 1 for my result, as the first "RPG" is ignored as it starts in the first position.

REGEXP_COUNT
           1

I can even take the table I used in the previous post and count the number of vowels in the players' names who have a "t" in their last names:

SELECT PID,FNAME || ' ' || LNAME AS NAME,
       REGEXP_COUNT(FNAME || LNAME),'[aeiou]','i') AS VOWELS
  FROM PERSON
 WHERE REGEXP_LIKE(LNAME,'t','i')

Which gives me:

Person   NAME                  VOWELS
id
     2   John Austin                4
     3   don bennett                3
     6   george curtis              5
    10   RON HEWITT                 3
    17   Frank Kletzenbauer         6
    18   arthur lightening          5
    22   KEN SATCHWELL              3
    23   Nelson Stiffle             4
    24   ray straw                  2

These examples are good to show you what the returned results are, but how would I code them in a program. Let me take the search string from the first examples and count the number of vowels in it.

01  **free

02  dcl-s SourceString char(50) 
          inz('RPGPGM.COM is the best website for rpg examples') ;
03  dcl-s Vowels int(5) ;

04  exec sql SET :Vowels = REGEXP_COUNT(:SourceString,
                                        '[aeiou]','i') ;

05  dsply ('Number of vowels = ' + %char(Vowels)) ;

06  *inlr = *on ;

Line 1: This program is written in totally free RPG.

Line 2: I have defined the search string as a variable, Searchstring.

Line 3: The variable Vowels will be used for the returned value of the number of vowels in the search string.

Line 4: Rather than use a Select statement, I decided to use a Set statement instead.

Line 5: I am using the Display operation code, DSPLY, to show the number in the variable Vowels.

When I run this program I get the following value displayed:

DSPLY  Number of vowels = 11

In the next example I only want the first row/record from the table/file that fits the selection criteria I used in the above examples with the table PERSON.

01  **free

02  dcl-s PlayerId packed(9) ;
03  dcl-s PlayerName varchar(30) ;
04  dcl-s Vowels int(5) ;

05  exec sql SELECT PID,
06                  FNAME || ' ' || LNAME,
07                  REGEXP_COUNT(FNAME || LNAME,'[aeiou]','i')
08             INTO :PlayerId,:PlayerName,:Vowels
09             FROM PERSON
10             WHERE REGEXP_LIKE(LNAME,'t','i')
11             FETCH FIRST ROW ONLY ;

12  *inlr = *on ;

Line 1: I am using totally free RPG.

Lines 2 – 4: I am declaring the variables I will be getting the results from the Select statement. I hope the names are descriptive enough for you to understand how the variables will be used.

Lines 5 – 11: My Select statement.
Lines 5 – 7: I broke the columns/fields I want returned onto three lines to make it easier to understand my result set.
Line 8: The results are moved into these variables.
Line 9: Name of the table.
Line 10: Is where I define that I only want player's whose last name contains the letter "t".
Line 11: I only want to return the first row of the results.

If I debug the program and break at line 12 I can see that the variables I defined contain the values I expected.

PLAYERID = 000000002.
PLAYERNAME = 'John Austin                   '
VOWELS = 4

 

Other Db2 for i regular expressions:

 

You can learn more about the REGEXP_COUNT Db2 for i function from the IBM website here.

 

This article was written for IBM i 7.3 TR2 and 7.2 TR6.

5 comments:

  1. the statement
    REGEXP(FNAME || LNAME),'[aeiou]','i') AS VOWELS
    generates REGEXP in *LIBL type *N was not found
    was REGEXP supposed to be an different expression?

    ReplyDelete
    Replies
    1. No that was a typo on my part [sad face]

      The correction has been made.

      It should be REGEXP_COUNT, not just REGEXP.

      Delete
  2. awesome....totally awesome!!!

    ReplyDelete
  3. This is another great example of coverage of Modern pattern matching using regular expressions. To keep up you need to read up on the new powerful tools you can use in modern RPG on the IBM i...it is a very POWERFUL system.

    ReplyDelete
  4. Another great article, please keep up the good work!

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.