Wednesday, May 30, 2018

Generating random numbers using SQL

generate random numbers using sql

In the past I have written about generating pseudo-random numbers using APIs, CEERAN0 and C's rand. But I always return to using SQL to generate the random numbers for me. I think it is because it is so simple to use.

In its simplest form if I want one pseudo-random number I can just use the following statement:

SELECT RAND() FROM SYSIBM.SYSDUMMY1

               RAND ( )
5.3215124973296302E-001

I get a floating point number returned.

One number is never enough to show anything provable. If I use a file with ten records in it, rather than SYSIBM.SYSDUMMY1, I can get ten pseudo-random numbers at a time:

SELECT RAND() FROM QTEMP.TESTFILE

RAND ( ) 8.8442640461439859E-001 1.1438337351603747E-001 8.3733634449293493E-001 1.8555253761406293E-001 7.1810052797021393E-002 7.7187414166692103E-001 5.3410443433942689E-001 6.6011535996581924E-002 5.0102237006744588E-001 6.3386944181646165E-001 RAND ( ) 7.5984984893337815E-001 5.4915005951109352E-001 3.2316049684133424E-001 1.8039490951261941E-001 1.6037476729636524E-001 9.2602313303018280E-001 2.3041474654377880E-001 8.6254463332010867E-001 6.4308603167821288E-001 8.5360271004364152E-001

I ran the statement twice just to prove that different numbers are generated every time.

In the above examples I am allowing the RAND() function to "self seed", rather than be seeded by a number I give. If I do "seed" the function I get the same results every time I run the statement.

SELECT RAND(100) FROM QTEMP.TESTFILE

RAND ( 100 ) 3.8642536698507646E-001 7.1388897366252635E-001 6.8852809228797263E-001 6.3228247932370985E-001 1.9269386883144626E-001 3.2746360667744986E-002 7.4614703817865535E-001 9.9832148197882020E-001 8.5183263649403362E-001 6.1983092745750298E-002 RAND ( 100 ) 3.8642536698507646E-001 7.1388897366252635E-001 6.8852809228797263E-001 6.3228247932370985E-001 1.9269386883144626E-001 3.2746360667744986E-002 7.4614703817865535E-001 9.9832148197882020E-001 8.5183263649403362E-001 6.1983092745750298E-002

Even the "self seeding" is seeded behind the scenes by the Db2 for i engine when the RAND() function is used. I am sure I was told it uses the current timestamp or something similar. The beauty of this is I do not have to care what Db2 for i will use for the seeding value.

The result is a floating point number, but if I want an integer number I can easily generate one using the CAST function.

SELECT CAST((CAST(RAND() AS DECIMAL(6,5))*1000) AS DECIMAL(4,0)) 
FROM QTEMP.TESTFILE

CAST function
      565
        4
      743
        2
      783
      843
      308
      591
      758
      482

While the above Select statement may look complicated in reality it is not. Let me break the statement into parts to show what it is doing.

CAST(RAND() AS DECIMAL(6,5))

In the inner most part I convert the floating point to a decimal value of 6,5 (9.99999).

(CAST(RAND() AS DECIMAL(6,5))*1000)

I want to generate a pseudo-random number that is between zero and one thousand, therefore I multiply the decimal value by 1,000.

CAST((CAST(RAND() AS DECIMAL(6,5))*1000) AS DECIMAL(4,0))

Having generated a number between zero and one thousand I need to present it as a decimal 4,0 value. And that gives the results shown.

So how could I use this in a program. The RPG example shows a simple program that inserts pseudo-random numbers into a file/table.

01  **free
02  ctl-opt option(*nodebugio:*srcstmt) ;
03  dcl-s Counter packed(2) ;

04  for Counter = 1 to 10 ;
05    exec sql INSERT INTO TESTFILE
         VALUES(CAST((CAST(RAND() AS DECIMAL(6,5))*1000) 
                AS DECIMAL(4,0)));
06  endfor ;

07  *inlr = *on ;

Line 1: This is written in totally free RPG, why would I use anything else?

Line 2: My favorite control options, which make debugging so much easier.

Line 3: I have defined this variable as a counter to be used in the program.

Line 4: I have defined this For group to execute ten times, using the counter variable I defined on line 3.

Line 5: This is SQL Insert statement uses the same logic I discussed above to generate a four long decimal pseudo-random number, which is inserted into a file/table.

After running the program if I look in the file/table I can see it contains ten pseudo-random numbers:

RANDOM_NBR
     885
     348
     571
     364
     698
     879
     847
      28
     997
     966

What if I only wanted to generate one pseudo-random number at a time. I could use a RPG program like this.

01  **free
02  ctl-opt option(*nodebugio:*srcstmt) ;
03  dcl-s Random1 float(8) ;
04  dcl-s Random2 packed(4) ;

05  exec sql SET :Random1 = RAND() ;
06  dsply Random1 ;

07  exec sql SET :Random2 =
            CAST((CAST(RAND() AS DECIMAL(6,5))*1000) 
                  AS DECIMAL(4,0)) ;
08  dsply Random2 ;

09  *inlr = *on ;

Line 3: This is how to define a floating point numeric variable.

Line 4: And this is a packed numeric variable.

Line 5: In this SQL statement I am retrieving the value generated by the RAND() function into the floating point numeric variable.

Line 6: I am using the Display operation code, DSPLY, to display the contents of the variable.

Line 7: In this SQL statement I am going to get a result that is decimal 4,0, and it will be placed in the packed variable.

Line 8: I am going to display the content of this variable too.

I did not use a For group, or Do loop, in this program as I wanted to show that the pseudo-random numbers generated each time the program was called do not repeat. In this scenario I called the program three times.

DSPLY  +6.192815942869351E-001
DSPLY   580                   
DSPLY  +7.357707449568163E-001
DSPLY   485                   
DSPLY  +4.492324594866787E-001
DSPLY   888

When do I use this function? At audit time. The auditors ask me for a sample, let's say a hundred, of the orders created last year. I can give them a list of a hundred orders picked randomly using the following CL code.

01  PGM

02  RUNSQL SQL('CREATE VIEW QTEMP.RANDOM1 +
               (RANDOM_NBR,ORDNBR,CRTDTE) +
               AS +
               SELECT RAND(),ORDNBR,CRTDTE +
                 FROM PRODLIB.ORDHDR +
                WHERE CRTDTE >= 1170401 +
                  AND CRTDTE < 1180401') +
            COMMIT(*NC)

03  RUNSQL SQL('CREATE TABLE QTEMP.RANDOM2 AS +
               (SELECT ORDNBR,CRTDTE +
                 FROM QTEMP.RANDOM1 +
                 ORDER BY RANDOM_NBR +
                 FETCH FIRST 100 ROWS ONLY) +
               WITH DATA') +
            COMMIT(*NC)

04  ENDPGM

Line 2: Here I am creating a View over the Order Header file. The View contains just three columns:

  1. Pseudo-random number
  2. Order number
  3. Order creation date

The Select statement is straight forward. The RAND() function is used for one of the three columns, which corresponds to the column RANDOM_NBR. I select a range of dates for the period the auditors are interested in, yes, this ERP stores its dates as CYMD format.

Line 3: I can then generate a Table from the View. By sorting the View by the pseudo-random number I can select the first hundred rows. All the auditors are interested in is the order number and date, so that is all I need to have in the Table. I can now download or email the Table, and pass it on to the auditors.

This View and Table are only temporary. As they are in QTEMP when the job ends, when I signoff, they are deleted. No need to worry about another access path over the Order Header file.

Previously the person who provided this information would upload the order number and date to a Microsoft Excel spreadsheet, then use its random function to select the desired number of orders. As you can see my method is a lot quicker and a lot less painful for me.

 

You can learn more about the RAND() SQL function from the IBM website here.

 

This article was written for IBM i 7.3, and should work for earlier releases too.

7 comments:

  1. What if you used the unseeded RAND() function to generate a temp file with random numbers between 1 and
    2,147,483,646.

    Then used those random numbers as seeds to the RAND(numberFromFisrtFile) to build your output file of random numbers?

    ReplyDelete
  2. Simon, wondering if you or anyone else has any data on performance of these random number solutions? We have a few processes that rely on random numbers on a rather large scale. Should SQL be avoided in that situation in favor of CEERAN0?

    ReplyDelete
    Replies
    1. I guess it would depend how you do it. If you have to produce a large number of random numbers you could do something like load then into an array before you need them. If you did that I would assume RAND() would be more efficient.

      Delete
  3. I think you could shorten the formula from

    CAST((CAST(RAND() AS DECIMAL(6,5))*1000) AS DECIMAL(4,0))
    to
    CAST(RAND() *1000 AS DECIMAL(4,0))

    Also, if you wanted to do something like simulate the roll of a die you would need to get a random number between 1 and 7. To do this you could use the formula...

    CAST(RAND() * 7 + 1 AS DECIMAL(1,0))

    ReplyDelete
  4. You can shorten it further to: INT(RAND()*1000)

    ReplyDelete
  5. Good article, but I have one question. When I execute Rand() from the green screen I get a number > 1. When I execute Rand() from a Gui screen (Access Client Solution sql window), I get an number less that 1. Any ideas?

    ReplyDelete
    Replies
    1. The first thing that springs to mind is to use an Absolute function to ensure that only a positive number is returned.

      SELECT ABS(RAND()) FROM SYSIBM.SYSDUMMY1

      Delete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.