Pages

Wednesday, December 15, 2021

Taking one string and breaking it apart into five consecutive rows

sql split and substring

The question was: How can I break apart a hundred long field into five twenty long fields and then have them returned, one after another, as separate result rows using SQL?

Obviously substring is going to be used, but how to return the resulting rows one after another?

Fortunately the easiest solution for this scenario came in earlier Technology Refreshes for the currently supported releases of IBM i, 7.4 TR4 and 7.3 TR10.

Before I show that I need to have a file with a field that is a hundred characters long:

01 A          R TESTFILER
02 A            FLD001       100A

I chose to use a file as the question asked for a file. But this will work just as well for a DDL Table:

01  CREATE TABLE MYLIB.TESTFILE
02    (FLD001 CHAR(100)) ;

I decided to write a "quick and dirty" RPG program to fill the file.

01  **free
02  dcl-f TESTFILE usage(*output) ;

03  dcl-ds FLD001 ;
04    SF1 char(20) ;
05    SF2 char(20) ;
06    SF3 char(20) ;
07    SF4 char(20) ;
08    SF5 char(20) ;
09  end-ds ;

10  SF1 = 'Light' ;
11  SF2 = 'Pencil' ;
12  SF3 = 'Headphones' ;
13  SF4 = 'Notepad' ;
14  SF5 = 'Phone ' ;
15  write TESTFILER ;

16  SF1 = 'Push pin' ;
17  SF2 = 'Scissors' ;
18  SF3 = 'Ceramic cat' ;
19  SF4 = 'Keyboard' ;
20  SF5 = 'Mouse' ;
21  write TESTFILER ;

22  *inlr = *on ;

Line 2: This is the file definition for the output file.

Lines 3 – 9: I have named this data structure to have the same name as the field in the output file. The five subfields make it easy for me to position data in the places I need.

Lines 10 – 21: I have created two records of things that I have on my desk.

I used SQL to check the data in the output file:

01  SELECT * FROM TESTFILE

The results are as I expected:


FLD001
-------------------------------------------------------------------------------------
Light               Pencil              Headphones          Notepad             Phone
Push pin            Scissors            Ceramic cat         Keyboard            Mouse

I am going to use the SQL SPLIT table function to break these records/rows into five parts, and return these parts one after another.

In my experience SPLIT works best if I use a "separator character" that delimits one sub field from another. I chose to use a comma ( , ), but I could have used any other character if commas are present in the string.

I can insert a comma at the end of each subfield using the SUBSTRING, or SUBSTR, statement:

01  SELECT SUBSTR(FLD001,1,20) || ',' || 
02         SUBSTR(FLD001,21,20) || ',' || 
03         SUBSTR(FLD001,41,20) || ',' || 
04         SUBSTR(FLD001,61,20) || ',' ||
05         SUBSTR(FLD001,81,20)
06    FROM TESTFILE

By all means the result is more than a hundred character, but that is unimportant as this is a temporary string.

I use the double pipes ( || ) to concatenate these parts into one string.


FLD001
-----------------------------------------------------------------------------------------
Light               ,Pencil              ,Headphones          ,Notepad             ,Phone
Push pin            ,Scissors            ,Ceramic cat         ,Keyboard            ,Mouse

If I wanted to I could remove all the trailing spaces when substring:

01  SELECT RTRIM(SUBSTR(FLD001,1,20)) || ',' || 
02         RTRIM(SUBSTR(FLD001,21,20)) || ',' || 
03         RTRIM(SUBSTR(FLD001,41,20)) || ',' || 
04         RTRIM(SUBSTR(FLD001,61,20)) || ',' ||
05         RTRIM(SUBSTR(FLD001,81,20))
06    FROM TESTFILE

Here I have used the RTRIM to perform a right sided trim. The results are more compacted than what I showed before:

00001
---------------------------------------------
Light,Pencil,Headphones,Notepad,Phone
Push pin,Scissors,Ceramic cat,Keyboard,Mouse

Now I can add the SPLIT in my statement. SPLIT table function requires two parameters:

  1. String to split
  2. Separator character

It returns two columns:

  1. Ordinal position – which element in the string this is
  2. Element – the extracted part of the string

My statement is:

01  SELECT ELEMENT,ORDINAL_POSITION 
02  FROM TESTFILE,
03       TABLE(SYSTOOLS.SPLIT(SUBSTR(FLD001,1,20) || ',' || 
04                             SUBSTR(FLD001,21,20) || ',' || 
05                             SUBSTR(FLD001,41,20) || ',' || 
06                             SUBSTR(FLD001,61,20) || ',' || 
07                             SUBSTR(FLD001,81,20),
08                            ','))

Line 1: I am really only interested in the ELEMENT column, but I have included the ORDINAL_POSITION so you can see that I have ten results, five from each record.

Lines 3 – 7: These substrings make my input string, as I have shown above. Notice that line 7 ends with a comma as that is the end of the first input parameter.

Line 8: This is the separator character, the comma.

The results are exactly what I wanted. The first five rows are from the first record, and the next five from the second.

ELEMENT         ORDINAL_POSITION
--------------  ----------------
Light                          1
Pencil                         2
Headphones                     3
Notepad                        4
Phone                          5
Push pin                       1
Scissors                       2
Ceramic cat                    3
Keyboard                       4
Mouse                          5

 

This article was written for IBM i 7.4 TR4 and 7.3 TR10.

4 comments:

  1. How about a lateral cross join with a values clause instead?
    It's faster and less error prone.

    SELECT
    TESTFILE_SPLIT.*
    FROM
    (VALUES('A B C D E ')) AS TESTFILE(FLD)

    CROSS JOIN LATERAL(
    VALUES
    (SUBSTR(TESTFILE.FLD, 1, 3)),
    (SUBSTR(TESTFILE.FLD, 4, 3)),
    (SUBSTR(TESTFILE.FLD, 7, 3)),
    (SUBSTR(TESTFILE.FLD, 10, 3)),
    (SUBSTR(TESTFILE.FLD, 13, 3))
    ) AS TESTFILE_SPLIT

    ReplyDelete
  2. Wow, didn't know about SYSTOOLS.SPLIT(). Very useful!

    Your above approach is probably the best, unless you don't know exactly what kind of characters are in your original string. If the data has commas anywhere, for example, then our SPLIT won't work as expected. We need to know 100% a delimiter that is safe to use, but might not always have that information.

    If you know that your data is always 100 long and want to split at 20, you could also do it as follows with UNION:

    WITH segment AS (
    SELECT ROW_NUMBER() OVER() AS rn#,
    SUBSTR(fld001, 1,20) fld1,
    SUBSTR(fld001,21,20) fld2,
    SUBSTR(fld001,41,20) fld3,
    SUBSTR(fld001,61,20) fld4,
    SUBSTR(fld001,81,20) fld5
    FROM TESTFILE
    )
    SELECT fld
    FROM (
    SELECT rn#, 1 piece, fld1 fld FROM segment UNION ALL
    SELECT rn#, 2 piece, fld2 fld FROM segment UNION ALL
    SELECT rn#, 3 piece, fld3 fld FROM segment UNION ALL
    SELECT rn#, 4 piece, fld4 fld FROM segment UNION ALL
    SELECT rn#, 5 piece, fld5 fld FROM segment
    )
    ORDER BY rn#, piece

    --

    The weakness of both the original SPLIT approach, and the above UNION approach is that they will both only work when you know the exact length of the original string and how many pieces you want to break it into. Because we know we will always have exactly 5 pieces, we know how many SUBSTRs to do, and how many UNIONs.
    In a more general scenario, if you wanted to break out a string of variable length into equal sized pieces, you could use recursive SQL. The below example is overbuilt for the specific question where we do know it's always 100 long and we want 20 length pieces, but can definitely come in useful for other similar problems. The below SQL lets you put in the size of the piece you want on the first line (current set to 20).

    WITH piecelngth(lngth) AS (VALUES (20)
    ), pieces(rn#, segment, fld, str) AS (
    SELECT rn#, 0 segment, '' fld, str
    FROM (SELECT ROW_NUMBER() OVER() rn#, fld001 AS str
    FROM TESTFILE
    )
    UNION ALL
    SELECT rn#,segment + 1 AS segment,
    CASE WHEN LENGTH(TRIM(str))0
    )
    SELECT fld
    FROM pieces
    WHERE segment>0
    ORDER BY rn#, segment

    ReplyDelete
  3. You can just do it in a less convoluted way; just think in a relational/functional way.
    Just do:
    SELECT SUBSTRING(FLD001, tmp.index, 20)
    FROM MYLIB.TESTFILE
    CROSS JOIN (VALUES 1,21,41,61,81 ) as tmp (index)

    ReplyDelete
  4. This correct the previous recursive definition because ordering preservation is a requirement.

    WITH BASE(field, index, len, part) AS (
    SELECT FLD001, 21, 20, substring(FLD001, 1, 20)
    FROM TRENTINIF.TESTFILE
    UNION ALL
    SELECT field, index+len, len, substring(field, index, len)
    FROM BASE WHERE substring(field, index, len) <> ''
    )
    SEARCH DEPTH FIRST BY field SET ordcol
    select part from base order by ordcol

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.