Wednesday, September 6, 2017

REGEXP for count

regexe expression count used for counting characters in a string

With the latest IBM i Technical Refreshes to 7.2, TR6, and 7.3, TR2, a number regular expressions functions were added to Db2 for i (SQL). In this post I am going to describe the REGEXP_COUNT function.

Regular readers of this blog will know that I have already written about the REGEXP_LIKE. Rather than repeat a lot of the things I wrote about in the post I am just going to refer you to it at various time in this one.

The format for this function is as follows:

  REGEXP_COUNT(source-string, search-pattern, start-position, 
               regexp-flags)

Only the first two parameters: source-string and search-pattern are mandatory, the other two are optional.

Let me start with some simple examples. In the example below I want to know how many times the search pattern 'rpg' appears in the search string.

SELECT 
  REGEXP_COUNT('RPGPGM.COM is the best website for rpg examples',
               'rpg')
  FROM SYSIBM.SYSDUMMY1

I have to use the dummy file SYSIBM.SYSDUMMY1 as this is a Select statement where the source string is not a row/field in a table/file.

My result is:

  REGEXP_COUNT
             1

It is only 1 as I searched for the lower case pattern 'rpg' that only occurs once in the source string. If I wanted my result to ignore case I need the 'i' in the regexp flags. For more example of ignoring case see the post about the REGEXP_LIKE.

SELECT
  REGEXP_COUNT('RPGPGM.COM is the best website for rpg examples',
               'rpg','i')
  FROM SYSIBM.SYSDUMMY1

Now I get the count of both occurrences of 'rpg'.

REGEXP_COUNT
           2

Notice that I used only the first, second, and fourth parameters, I did not have to give a value for the starting position parameter. Db2 is smart enough to know I had omitted it. If I was to include it the statement would look like.

SELECT
  REGEXP_COUNT('RPGPGM.COM is the best website for rpg examples',
               'rpg',1,'i')
  FROM SYSIBM.SYSDUMMY1

I can use the start position to start my search for the search pattern in the second position:

SELECT
  REGEXP_COUNT('RPGPGM.COM is the best website for rpg examples',
               'rpg',2,'i')
  FROM SYSIBM.SYSDUMMY1

I only return a count of 1 for my result, as the first "RPG" is ignored as it starts in the first position.

REGEXP_COUNT
           1

I can even take the table I used in the previous post and count the number of vowels in the players' names who have a "t" in their last names:

SELECT PID,FNAME || ' ' || LNAME AS NAME,
       REGEXP_COUNT(FNAME || LNAME),'[aeiou]','i') AS VOWELS
  FROM PERSON
 WHERE REGEXP_LIKE(LNAME,'t','i')

Which gives me:

Person   NAME                  VOWELS
id
     2   John Austin                4
     3   don bennett                3
     6   george curtis              5
    10   RON HEWITT                 3
    17   Frank Kletzenbauer         6
    18   arthur lightening          5
    22   KEN SATCHWELL              3
    23   Nelson Stiffle             4
    24   ray straw                  2

These examples are good to show you what the returned results are, but how would I code them in a program. Let me take the search string from the first examples and count the number of vowels in it.

01  **free

02  dcl-s SourceString char(50) 
          inz('RPGPGM.COM is the best website for rpg examples') ;
03  dcl-s Vowels int(5) ;

04  exec sql SET :Vowels = REGEXP_COUNT(:SourceString,
                                        '[aeiou]','i') ;

05  dsply ('Number of vowels = ' + %char(Vowels)) ;

06  *inlr = *on ;

Line 1: This program is written in totally free RPG.

Line 2: I have defined the search string as a variable, Searchstring.

Line 3: The variable Vowels will be used for the returned value of the number of vowels in the search string.

Line 4: Rather than use a Select statement, I decided to use a Set statement instead.

Line 5: I am using the Display operation code, DSPLY, to show the number in the variable Vowels.

When I run this program I get the following value displayed:

DSPLY  Number of vowels = 11

In the next example I only want the first row/record from the table/file that fits the selection criteria I used in the above examples with the table PERSON.

01  **free

02  dcl-s PlayerId packed(9) ;
03  dcl-s PlayerName varchar(30) ;
04  dcl-s Vowels int(5) ;

05  exec sql SELECT PID,
06                  FNAME || ' ' || LNAME,
07                  REGEXP_COUNT(FNAME || LNAME,'[aeiou]','i')
08             INTO :PlayerId,:PlayerName,:Vowels
09             FROM PERSON
10             WHERE REGEXP_LIKE(LNAME,'t','i')
11             FETCH FIRST ROW ONLY ;

12  *inlr = *on ;

Line 1: I am using totally free RPG.

Lines 2 – 4: I am declaring the variables I will be getting the results from the Select statement. I hope the names are descriptive enough for you to understand how the variables will be used.

Lines 5 – 11: My Select statement.
Lines 5 – 7: I broke the columns/fields I want returned onto three lines to make it easier to understand my result set.
Line 8: The results are moved into these variables.
Line 9: Name of the table.
Line 10: Is where I define that I only want player's whose last name contains the letter "t".
Line 11: I only want to return the first row of the results.

If I debug the program and break at line 12 I can see that the variables I defined contain the values I expected.

PLAYERID = 000000002.
PLAYERNAME = 'John Austin                   '
VOWELS = 4

 

Other Db2 for i regular expressions:

 

You can learn more about the REGEXP_COUNT Db2 for i function from the IBM website here.

 

This article was written for IBM i 7.3 TR2 and 7.2 TR6.

5 comments: