Wednesday, November 7, 2018

Using new SQL built in function to convert character to number

using sql built in funtion to_number decfloat_format to convert alphanumeric number to decimal number

Included in the technical refreshes for IBM i 7.3 TR5 and 7.2 TR9 was a new Db2 for i built in function to convert character strings into numbers.

Why is this needed when I can just CAST the one type of a column to another?

01  SELECT COLUMN1,
02         CAST(COLUMN1 AS DECIMAL(10,2))
03    FROM QTEMP.TESTFILE

Numbers are problematic as I can make character representations of them in many different ways. I created a few examples of character representations of numbers, and then used the above SQL statement to display them.

COLUMN1         CAST function
12345               12,345.00
12345.67            12,345.67
12345.678           12,345.67
12,345.67       ++++++++++++++
-1234.56             1,234.56-
1234-           ++++++++++++++
+1234                1,234.00
$12,345.67      ++++++++++++++
0000001.0000             1.00

The lines of plus symbols ( + ) denote values that could not be translated by the CAST. All my results have two decimal places even when the original value was an integer.

The new built in function comes with two names, DECFLOAT_FORMAT and TO_NUMBER, and they both do the same things. If I add them to the previous SQL statement:

01  SELECT COLUMN1,
02         CAST(COLUMN1 AS DECIMAL(10,2))
03         DECFLOAT_FORMAT(COLUMN1),
04         TO_NUMBER(COLUMN1)
05    FROM QTEMP.TESTFILE

I get the following results:

COLUMN1         CAST function   DECFLOAT_FORMAT   TO_NUMBER
12345               12,345.00             12345       12345
12345.67            12,345.67          12345.67    12345.67
12345.678           12,345.67         12345.678   12345.678
12,345.67       ++++++++++++++  +++++++++++++++   +++++++++
-1234.56             1,234.56-         -1234.56    -1234.56
1234-           ++++++++++++++  +++++++++++++++   +++++++++
+1234                1,234.00              1234        1234
$12,345.67      ++++++++++++++  +++++++++++++++   +++++++++
0000001.0000             1.00            1.0000      1.0000

The new built in functions cannot translate same character values that the CAST could not, but that the numbers returned are more accurate representations of the original in the character column.

Before I go any further I am going to explain why I use TO_NUMBER, rather than DECFLOAT_FORMAT. In my opinion the TO_NUMBER is more descriptive of what this built in function does, converting a character string to a number. The name DECFLOAT_FORMAT is not as clear. Therefore, for the rest of this post I will be using TO_NUMBER. Just remember that as they are the same what works/does not work with TO_NUMBER is the same with DECFLOAT_FORMAT.

These built in functions have two parameters:

  1. The column or string to convert.
  2. The format string, the format of the character representation of the number.

In the above example I did not use a format string, therefore, the built in function's default was used. I am not going to list all of them here, if you are interested click on the link at the bottom of this post.

In the below example I am saying that all of the character representations of the numbers are thousand separated and have two decimal places. I can use either 0 (zeroes) or 9 to represent the numeric characters in the string. Personally I prefer using 9.

01  SELECT COLUMN1,
02         TO_NUMBER(COLUMN1,'999,999,999.99')
03    FROM QTEMP.TESTFILE

As you have seen above not all of the "numbers" in COLUMN1 are in that format. Thus, I receive the following error:

Query cannot be run.  See lower level messages.

When I go to the job log and look at the lower level messages I see:

Program or service program QSQSETDIT in library QSYS ended. Reason
  code 5.
User-defined function error on member TESTFILE.
Cancel reply received for message CPF503E.

From experience I have found that if not all the "numbers" are in the given format then reason code 5 is returned by the first message. I am not going to show what reason code 5 means, as in my opinion it is confusing. Just take my word that if you see reason code 5 you have at least one "number" not in the expected format.

For the first "number" that could not be translated contains a thousand separator, 12,345.67, therefore, I would use the following format string:

01  SELECT TO_NUMBER('12,345.67','999,999,999.99')
02    FROM SYSIBM.SYSDUMMY1

TO_NUMBER
 12345.67

The next number that I had a problem with had a minus sign at the end. Even though this number does not have thousand separators I have found the format string can. The MI indicates that the minus sign is at the end of the number:

01  SELECT TO_NUMBER('1234-','999,999,999MI')
02    FROM SYSIBM.SYSDUMMY1

TO_NUMBER
    -1234

For some reason a S is used to indicate that there is a leading minus sign, an example will be given later.

The last "number" had a currency symbol, $, and thousand separators:

01  SELECT TO_NUMBER('$12,345.67','$999,999,999.99')
02    FROM SYSIBM.SYSDUMMY1

TO_NUMBER
 12345.67

I can put commas ( , ) anywhere in the "number" string and providing the format string matches I can get a valid number, for example:

01  SELECT TO_NUMBER('123,45,6789,0','999,99,9999,9')
02    FROM SYSIBM.SYSDUMMY1

 TO_NUMBER
1234567890

In reality we do not get columns of data in different numeric formats, all of the values adhere to one number format. I receive a file twice a week that contains two columns of "numbers":

  1. Quantity
  2. Amount in US dollars

Using these built in functions I can convert the character "number" to a real number:

01  SELECT QTYCHAR,
02         TO_NUMBER(QTYCHAR,'S999,999,999.99'),
03         AMTCHAR,
04         TO_NUMBER(AMTCHAR,'S$999,999.99')
05    FROM QTEMP.TESTFILE

QTYCHAR     TO_NUMBER   AMTCHAR       TO_NUMBER
12,345.00    12345.00   $12,345.00     12345.00
12,345.67    12345.67   $12,345.67     12345.67
-1,234.56    -1234.56   -$1,234.56     -1234.56
123.00         123.00   $123.00          123.00
0.00             0.00   $0.00              0.00

The real proof is if I can take those generated numbers and insert them into decimal columns in another table. My output table just has two columns, both of them decimal type.

01  CREATE TABLE QTEMP.OUTFILE (
02    AMOUNT DECIMAL(10,2),
03    QUANTITY DECIMAL(10,2)
04  ) ;

The Insert statement inserts the converted amount and quantity columns in all of the row from TESTFILE into the OUTFILE. The amount column could have a leading minus sign, a currency symbol, and has thousand separators, The quantity column can have a leading minus sign, and has thousand separators.

01  INSERT INTO QTEMP.OUTFILE
02  (SELECT TO_NUMBER(AMTCHAR,'S$999,999,999.99'),
03          TO_NUMBER(QTYCHAR,'S999,999,999.99')
04     FROM QTEMP.TESTFILE)

The results are expected, the character values have been converted to numbers and inserted.

   AMOUNT        QUANTITY
12,345.00       12,345.00
12,345.67       12,345.67
 1,234.56-       1,234.56-
   123.00          123.00
      .00             .00

 

You can learn more about the TO_NUMBER and DECFLOAT_FORMAT SQL built in function from the IBM website here.

 

This article was written for IBM i 7.3 TR5 and 7.2 TR9.

5 comments:

  1. Alternative to Built-In functions in native operation. Excellent

    ReplyDelete
  2. This is fantastic information, always seems to be a translation issue with numbers and characters....not to mention character sets.

    ReplyDelete
  3. Thank you for including comparisons and explaining the differences.

    ReplyDelete
  4. Always informative and knowledgeable 🙌 Thanks again

    ReplyDelete
  5. Thanks for the useful info Simon. These functions are always helpful.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.