Wednesday, June 1, 2022

New SQL built in function allows for validating data types

sql try_cast cast

I am sure many of us have found that when we try and cast, in SQL, from one data type to another the “casting" will fail. This is especially true when moving character representations of numbers or dates to a decimal or date type. With the new release IBM i 7.5 and 7.4 TR6 there is a new SQL built in function that allows for the testing of a cast before it is performed.

Before I start explaining this new built in function let me define the table I will be using, TESTTABLE:

CREATE TABLE MYLIB.TESTTABLE
(COLUMN1 CHAR(10),
 COLUMN2 CHAR(10)) ;

It has two columns I will be using in this post. For now, I am only interested in the contents of the column COLUMN1, which I can see using the following SQL statement:

SELECT COLUMN1 FROM TESTTABLE ;

The results show me that this column contains a mixture of information:

COLUMN1
--------
ABC
123
456.78
9,999
20220101
2022-01-01

If I wanted to convert the contents of this column to be numeric I would use the following statement:

SELECT COLUMN1,CAST(COLUMN1 AS DECIMAL) 
FROM TESTTABLE ;

I am running this statement in ACS's Run SQL scripts, therefore, my results look like:

COLUMN1     00002
----------  -------------
ABC         ++++++++++++
123         123
456.78      456
9,999       ++++++++++++
20220101    ++++++++++++
2022-01-01  ++++++++++++

The values it could not cast to a decimal type are shown by the plus ( + ) signs. How do I test that this character value is a “number"?

This is where the new built in function, BiF, comes to the rescue.

TRY_CAST can be thought to do two things:

  1. Validates the value to be cast, and return null if it cannot be mapped to the desired data type
  2. If it can be mapped then it is cast to the new data type

This allows me to use the following statement:

01  SELECT COLUMN1,
02         TRY_CAST(COLUMN1 AS DECIMAL),
03         TRY_CAST(COLUMN1 AS DECIMAL(12,2)),
04         TRY_CAST(REPLACE(COLUMN1,',','') AS DEC(12,2))
05   FROM TESTTABLE ;

Line 2: Here I will try to cast the value to a decimal value, with no decimal places.

Line 3: This time I am casting the value to be a specific size, seven with two decimal places.

Line 4: If the “number" has been formatted to contain a thousand separator I will replace that with null, which removes it from the value before I try to cast the value.

The results are as follows:

COLUMN1      00002        00003        00004
-------   --------  -----------  -----------
ABC         <NULL>       <NULL>       <NULL>
123            123       123.00       123.00
456.78         456       456.78       456.78
9,999       <NULL>       <NULL>      9999.00
20220101    <NULL>  20220101.00  20220101.00
2022-01-01  <NULL>       <NULL>       <NULL>

This is exactly what I wanted. The invalid values rows return null. The exception is the fourth row. As it contains a thousand separator is fails the validation for the casting on line 2 and 3. But as I remove the comma using a REPLACE it appears as a valid number in the fourth column.

The fifth row fails the first cast as it is too long for the default length of a decimal type column. This is corrected in the following two columns as I defined the length of the column.

This time I am going to convert the values in COLUMN1 to integer:

01  SELECT COLUMN1,
02         TRY_CAST(COLUMN1 AS INT),
03         TRY_CAST(REPLACE(COLUMN1,',','')
04    FROM TESTTABLE ;

The results are pretty much the same as they were for the decimal, but these are integers instead.

COLUMN1   00002         00003
-------   ----------    ---------
ABC           <NULL>       <NULL>
123              123          123
456.78           456          456
9,999         <NULL>         9999
20220101    20220101     20220101
2022-01-01    <NULL>       <NULL>

As in my previous example the value in column 2 for the fourth row is null as the thousand separator cannot be translated into integer. While for the results in the third column I have replaced/removed the comma.

The other scenario I see that I commonly find bad data is with dates and, to a less degree, times.

It is not possible to directly cast number to dates. Therefore, I have a series of character representations of numbers in COLUMN2.

SELECT COLUMN2 FROM TESTTABLE ;

The results are:

COLUMN2
--------
20220101
20220230
20229999
20221231
20220615
20220506

Let me try to convert those to dates with the TRY_CAST.

01  SELECT COLUMN2,
02         TRY_CAST(COLUMN2 AS DATE) AS "Date?"
03  FROM TESTTABLE ;

The results:

COLUMN2    Date?
--------   ------
20220101   <NULL>
20220230   <NULL>
20229999   <NULL>
20221231   <NULL>
20220615   <NULL>
20220506   <NULL>

All of the TRY_CAST failed as they are not formatted to look like dates. For that I need to insert dashes ( - ) into them. Which is what the following statement does:

SELECT COLUMN2,
       TRY_CAST((SUBSTR(COLUMN2,1,4) || '-' ||
                 SUBSTR(COLUMN2,5,2) || '-' ||
                 SUBSTR(COLUMN2,7,2)) AS DATE) AS "Date?"
FROM TESTTABLE ;

The results look better:

COLUMN2    Date?
--------   ----------
20220101   2022-01-01
20220230   <NULL>
20229999   <NULL>
20221231   2022-12-31
20220615   2022-06-15
20220506   2022-05-06

The second and third rows fail as they are not valid dates even with the dashes in them.

I can even use the TRY_CAST in a statement to return zero rather than null:

SELECT CASE WHEN TRY_CAST(COLUMN1 AS INTEGER) IS NULL THEN 0 
       ELSE COLUMN1 
       END
  FROM TESTTABLE ;

The results now show zero in those row that could not be cast.

00001
---------
0
123
456
0
20220101
0

I can take that logic and use it in a RPG program to validate the column's value and give it a default value.

01  **free
02  ctl-opt option(*nodebugio:*srcstmt) ;

03  dcl-ds Data qualified dim(10) ;
04    Column1 char(10) ;
05    Number packed(12:2);
06    Integer int(10) ;
07  end-ds ;

08  dcl-s Rows uns(3) inz(%elem(Data)) ;

09  exec sql DECLARE C0 CURSOR FOR
10      SELECT COLUMN1,
11          CASE WHEN TRY_CAST(REPLACE(COLUMN1,',','') AS DEC(12,2))
12                        IS NULL THEN -1
13          ELSE CAST(REPLACE(COLUMN1,',','') AS DEC(12,2))
14          END,
15          CASE WHEN TRY_CAST(REPLACE(COLUMN1,',','') AS INT)
16                        IS NULL THEN -1
17          ELSE CAST(REPLACE(COLUMN1,',','') AS INT)
18          END
19       FROM TESTTABLE
20        FOR READ ONLY ;

21  exec sql OPEN C0 ;

22  exec sql FETCH C0 FOR :Rows ROWS INTO :Data ;

23  exec sql CLOSE C0 ;

24  *inlr = *on ;

Line 1: If it is not totally free RPG then I am doing something wrong!

Line 2: My favorite control options.

Lines 3 – 7: Data structure array I will be using to contain data I fetch from my table. It contains three subfields:

  1. The original value in the row
  2. The value converted to decimal
  3. Value converted to an integer

Line 8: This variable will contain the number of elements that is in the data structure array.

Lines 9 – 20: Definition of the cursor I will be using.

lines 11 – 14: I am using a Case function with the TRY_CAST, line 11, to determine if this value can be cast to a decimal value. If it cannot I give this column a value of -1. If this value can be cast, line 17, then I just use the Cast function to convert it to decimal.

Lines 15 – 18: Does the same as above but tries to cast to an integer type.

Line 19: From my test file.

Line 20: For read only tells the SQL precompiler that this cursor is input only.

Line 21: Cursor is opened.

Line 22: Multiple rows are fetched into the data structure array.

Line 23: Cursor is closed.

After compiling this program I start debug and add a breakpoint at line 24. When I call the program, and the debug breakpoint is reached, I can display the contents of the data structure array:

> EVAL data                      
  DATA.COLUMN1(1) = 'ABC       '
  DATA.NUMBER(1) = -0000000001.00
  DATA.INTEGER(1) = -1

  DATA.COLUMN1(2) = '123       '
  DATA.NUMBER(2) = 0000000123.00 
  DATA.INTEGER(2) = 123

  DATA.COLUMN1(3) = '456.78    '
  DATA.NUMBER(3) = 0000000456.78 
  DATA.INTEGER(3) = 456

  DATA.COLUMN1(4) = '9,999     ' 
  DATA.NUMBER(4) = 0000009999.00 
  DATA.INTEGER(4) = 9999

  DATA.COLUMN1(5) = '20220101  '
  DATA.NUMBER(5) = 0020220101.00 
  DATA.INTEGER(5) = 20220101

  DATA.COLUMN1(6) = '2022-01-01'
  DATA.NUMBER(6) = -0000000001.00
  DATA.INTEGER(6) = -1

I have added a space after each row's results to make it easier to understand the results.

As you can see all of the rows that could not be converted contain the number -1.

In the short time I have played with TRY_CAST I can see how useful it is going to be for a lot of the work I do, with things like numbers and dates that are passed to my RPG programs as character strings. Good job IBM!

 

You can learn more about the TRY_CAST built in function from the IBM website here.

 

This article was written for IBM i 7.5 and 7.4 TR6.

1 comment:

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.