Wednesday, April 26, 2017

Using SQL to aggregate columns into one returned result

aggregating multiple values into one result set using sql

There are times when I have want an easy way to put values from a file, or table, "horizontally" into one column rather than vertically with more than one row.

"Vertical" "Horizontal"
TYPE        THING
ANIMAL     ELK
ANIMAL     HAWK
ANIMAL     MEERKAT
ANIMAL     ORCA
TYPE       THINGS
ANIMAL     ELK, HAWK, MEERKAT, ORCA, SEA L
FRUIT      APPLE, FUJI, BLACKBERRY, GRAPEF
VEGETABLE  SQUASH

While going through all the additions and changes for IBM i 7.3 TR2 and 7.2 TR6 I came across a SQL built in function, BiF, that gives me what I need, without me having to do any complicated work. This BiF is LISTAGG, and it aggregates all non-null values into one string. I am going to give some examples of how to use it. I am going to use data from both DDS files and SQL DDL tables, as I know most of us still work with legacy systems based on a lot of DDS files, and it will be useful examples to show the SQL DDL equivalent with it.

The first file and table I will be using look like:

  CREATE TABLE MYLIB.TESTTAB1 (
    TYPE CHAR(10) NOT NULL DEFAULT '',
    THING CHAR(20) NOT NULL DEFAULT ''
  )
  A          R TESTFILE1R
  A            TYPE          10A
  A            THING         20A

I can see the contents of the table TESTAB1 in "vertical" form with this simple SQL statement:

SELECT * FROM TESTTAB1
 ORDER BY TYPE,THING

TYPE       THING
ANIMAL     ELK
ANIMAL     ELK
ANIMAL     HAWK
ANIMAL     MEERKAT
ANIMAL     ORCA
ANIMAL     SEA LION
FRUIT      APPLE, FUJI
FRUIT      BLACKBERRY
FRUIT      GRAPEFRUIT
FRUIT      GUAVA
FRUIT      RASPBERRY
VEGETABLE  SQUASH

Notice how there are two identical rows/records "ANIMAL   ELK".

If I create a SQL statement using the LISTAGG BiF in its simplest for it would look like:

01  SELECT TYPE,
02         LISTAGG(THING,', ') 
03           WITHIN GROUP(ORDER BY THING)
04    FROM TESTFILE1
05   GROUP BY TYPE

Line 1: The standard SQL Select so far…

Line 2 and 3: This is the part of the statement that does the aggregating.

Line 2: This is the start of the BiF, where I give the name of the column I want to aggregate and the separator string.

Line 3: Within the list of rows retrieved I want them sorted by the value in the field THING.

Line 4: The name of the table of file that contains the data.

Line 5: To use this BiF I have to GROUP my results.

My results look like:

TYPE       LISTAGG
ANIMAL     ELK                 , ELK                 , HAWK
FRUIT      APPLE, FUJI         , BLACKBERRY          , GRAPEFRU
VEGETABLE  SQUASH

My results in the aggregated string (row) are all sorted by their value (this is only part of the results as the rest would not fit on this page). To my disappointment the columns were just concatenate, not first trimmed. To achieve a better list I would need to use the right trim BiF, RTRIM. I am also going to change the separator character too.

01  SELECT TYPE,
02         LISTAGG(RTRIM(THING),':') 
03           WITHIN GROUP(ORDER BY THING)
04    FROM TESTFILE1
05   GROUP BY TYPE

In my opinion this is far more pleasing:

TYPE       LISTAGG
ANIMAL     ELK:ELK:HAWK:MEERKAT:ORCA:SEA LION
FRUIT      APPLE, FUJI:BLACKBERRY:GRAPEFRUIT:GUAVA:RASPBERRY
VEGETABLE  SQUASH

I still have two elks. If I only want to display one I would use DISTINCT on line 2.

01  SELECT TYPE,
02         LISTAGG(DISTINCT RTRIM(THING),':') 
03           WITHIN GROUP(ORDER BY THING)
04    FROM TESTTAB1
05   GROUP BY TYPE

Notice that there is only a space between the DISTINCT and the RTRIM, not a comma ( , ).

Now there is only one elk.

TYPE       LISTAGG
ANIMAL     ELK:HAWK:MEERKAT:ORCA:SEA LION
FRUIT      APPLE, FUJI:BLACKBERRY:GRAPEFRUIT:GUAVA:RASPBERRY
VEGETABLE  SQUASH

I use variable character columns a lot when defining DDL tables as it will make my tables smaller and, therefore, use less disk space.

  CREATE TABLE MYLIB.TESTTAB2 (
    TYPE CHAR(10) NOT NULL DEFAULT '',
    THING VARCHAR(20) NOT NULL DEFAULT ''
  )
  A          R TESTFILE2R
  A            TYPE          10A
  A            THING         20A         VARLEN 

As the THING column/field is defined as variable character when I use it with the LISTAGG I do not have to use RTRIM to get the same result as my last result set.

01  SELECT TYPE,
02         LISTAGG(DISTINCT THING,':') 
03           WITHIN GROUP(ORDER BY THING)
04    FROM TESTTAB2
05   GROUP BY TYPE

The standard result set for character data is placed in the equivalent of a VARCHAR(4000). If I use the simple statement I have given above, and my result is greater than 4,000 characters my results would look like:

TYPE       LISTAGG
ANIMAL      +++++++++++++++++++++++++++++++++++++
FRUIT       +++++++++++++++++++++++++++++++++++++
VEGETABLE   SQUASH

What are my alternatives to give a better result:

1. Increase the size of the result set by using CAST, for example:

01  SELECT TYPE,
02         LISTAGG(CAST(THING AS VARCHAR(32730)),':') 
03           WITHIN GROUP(ORDER BY THING)
04    FROM TESTFILE2
05   GROUP BY TYPE

Note:: Through a process of trial and error I have found that the maximum size that can be used with the cast statement is 32,730.

2. Within the LISTAGG I can request that is gives me an error:

01  SELECT TYPE,
02         LISTAGG(THING,':' ON OVERFLOW ERROR) 
03           WITHIN GROUP(ORDER BY THING)
04    FROM TESTTAB2
05   GROUP BY TYPE

This does return the error SQL0101 which has the description: "SQL statement too long or complex". But it did not cause the statement to throw an error when I was testing using STRSQL.

3. I prefer this method, to truncate my result string. I add the TRUNCATE to the LISTAGG, along with the characters to use. I find that most people understand what the ellipsis/three periods ( ... ) means.

01  SELECT TYPE,
02         LISTAGG(THING,':' ON OVERFLOW TRUNCATE '...' 
                   WITH COUNT) 
03           WITHIN GROUP(ORDER BY THING)
04    FROM TESTFILE2
05   GROUP BY TYPE

The results that are too long for the result string show the ellipsis/three periods followed by the count of number of values that were truncated.

TYPE        LISTAGG
ANIMAL     ...(5)
FRUIT      ...(5)
VEGETABLE  SQUASH

I can already see where I will be using this SQL built in function in my work, in my opinion this is a great addition to DB2 for i.

 

You can learn more about the LISTAGG SQL built in function from the IBM website here.

 

This article was written for IBM i 7.3, and should work with 7.2 too.

17 comments:

  1. Didn't know it, very useful! From wich release is it available?

    ReplyDelete
    Replies
    1. 7.3 TR2 and 7.2 TR6.

      If you are on either of those releases with a lower TR it is not a big deal to move up to the latest.

      Delete
  2. Many times I am looking for pivot table functionality. This looks like a great addition to making the data sets we need.

    ReplyDelete
  3. Replies
    1. Thank you! It is an exciting time to be an IBMi developer

      Delete
  4. Great post for a great new bif, thanks Simon

    ReplyDelete
  5. Thank you Simon..amazing article and beautiful feature.(Earlier I had written a lots of legacy code to achieve it..now not required :))

    ReplyDelete
  6. Mohammed M' BarkiApril 27, 2017 at 3:07 AM

    Thanks Simon. Great post !

    ReplyDelete
  7. very very good, as usual!

    ReplyDelete
  8. I really like that for using as a replacement for other pivot table techniques
    Rob

    ReplyDelete
  9. Great! :)

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.