Sunday, September 1, 2013

CHAIN versus SETLL the results

Having started the debate on which is faster in my post Validation: CHAIN versus SETLL I decided to put the theory to the test.

I created a DDS file with a single key:

  A                                      UNIQUE
  A          R TESTFILER
  A            KEY            7P 0
  A            F1             3
  A            F2             5P 0
  A            F3            30
  A            F4             3P 2
  A            F5            50
  A          K KEY

And filled the file with a million records, the field KEY contained the values of 1 to 1 million.

I created two almost identical programs. Each one would perform their operation 1 million times, and write a record to an output file with the Start time, End time, and the number of milliseconds it took to perform the 1 million operations. Those programs are listed below:

CHAIN program
01 FTESTFILE  IF   E           K DISK
02 FTESTPF    O    E             DISK

03 D KeyFld          S                   like(KEY)
04 D TmeStmp         S               Z
05 D i               S             10U 0
    /free
06     PGM = 'CHAIN' ;
07     START = %timestamp() ;

08     for i = 1 by 1 to 1000000 ;
09        TmeStmp = %timestamp() ;
10        KeyFld = %subdt(TmeStmp:*ms) ;
11        chain KeyFld TESTFILER ;
12     endfor ;

13     FINISH = %timestamp() ;
14     DIFFERENCE = %diff(FINISH:START:*ms) ;
15     write TESTPFR ;
16     *inlr = *on ;

SETLL program
01 FTESTFILE  IF   E           K DISK
02 FTESTPF    O    E             DISK

03 D KeyFld          S                   like(KEY)
04 D TmeStmp         S               Z
05 D i               S             10U 0
    /free
06     PGM = 'SETLL' ;
07     START = %timestamp() ;

08     for i = 1 by 1 to 1000000 ;
09        TmeStmp = %timestamp() ;
10        KeyFld = %subdt(TmeStmp:*ms) ;
11        setll KeyFld TESTFILER ;
12     endfor ;

13     FINISH = %timestamp() ;
14     DIFFERENCE = %diff(FINISH:START:*ms) ;
15     write TESTPFR ;

16    *inlr = *on ;

The only differences are:

  1. Line 6: Name written to the field in the output file.
  2. Line 11: The operation is performed.

A CL program was written to call each of these programs five times.

The CL was submitted to the QINTER job queue on an IBM i 8406 70Y on a Sunday afternoon. Being a holiday weekend I knew that there would be no-one else running jobs on this server. The results were:

PGM DIFFERENCE
microseconds
CHAIN 34,178,000
SETLL 32,557,000
CHAIN 34,057,000
SETLL 32,547,000
CHAIN 34,271,000
SETLL 32,556,000
CHAIN 34,356,000
SETLL 32,785,000
CHAIN 34,467,000
SETLL 32,608,000
Average CHAIN 34,287,750
Average SETLL 32,610,600

This shows that in these programs where the operation was performed 1 million times the SETLL is 1.68 seconds faster than the CHAIN.

For one single operation the difference is negligible. But we all need to make our own decisions on when to use these kinds of performance differences, when to use one operation code rather than another, and what this conveys to others, or to are ourselves at a later date, when looking at the code.


Monday September 2: John Blenkinsop made an interesting comment about what would happen if there were unsuccessful SETLL and CHAIN operations?

I rebuilt the input file with the key field, KEY, containing the values of 1 to 1 million, but this time I incremented by 2, resulting in a file of 500,000 records.

I ran the same programs that I had before and these were the results:

PGM DIFFERENCE
microseconds
CHAIN 61,133,000
SETLL 55,052,000
CHAIN 59,066,000
SETLL 47,901,000
CHAIN 58,410,000
SETLL 47,259,000
CHAIN 56,340,000
SETLL 48,568,000
CHAIN 57,335,000
SETLL 48,109,000
Average CHAIN 58,456,800
Average SETLL 49,377,800

The difference is even more significant, 9.08 seconds, in favor of SETLL.

12 comments:

  1. I don't think it matters a much that there is only 1.68 sec difference for execution of million times. What matters if you couple this with 10 to twenty other times in that program you do the more expensive operation, along with other cpu wasters. Then you are talking about some time especially if you couple that was another 20 or thirty programs that have the same type of philosophy.

    To me it all boils down to are you willing to do things the most efficiently (with out using something obfuscated and hard to maintain) or just not care and add bigger faster cpu when you need it.

    Sure one operation run even a million times in a one time program is no problem, but 50 to 100 of them in as many programs run every night in a nightly cycle could make the difference as to whether you have time to do your backups or get the system back to the users in a timely manner, and in the 24/7 world that can be important.

    ReplyDelete
  2. (John Blenkinsop, AS400 Specialists @ LinkedIn)

    In your test program, you created a test file with 1,000,000 records keyed from 1 to 1,000,000.

    You CHAIN or SETLL with a key value derived from the timestamp - that is, an integer number of milliseconds. Every CHAIN or SETLL will get a hit.

    But what about the difference in performance when there is NO matching record?

    I created the test data key from 1 to 1,000,000 in an increment of 2, giving 500,000 records in the test file.

    I then used amended versions of your programs which set the key from 1 to 1,000,000 in an increment of 1. Therefore half of the CHAINS and SETLLs would fail.

    The resulting difference (admittedly in only one run of each program) was 6 seconds:

    *...+....1....+....2....+....3....+....4....+....5....+....6....+....7
    CHAIN 2013-09-02-11.15.40.4500002013-09-02-11.16.25.139000 ãi
    CCCCD44444FFFF6FF6FF6FF4FF4FF4FFFFFFFFFF6FF6FF6FF4FF4FF4FFFFFF00004800
    38195000002013009002011B15B40B4500002013009002011B16B25B1390000004690F

    SETLL 2013-09-02-11.16.31.6790002013-09-02-11.17.10.219000 e
    ECEDD44444FFFF6FF6FF6FF4FF4FF4FFFFFFFFFF6FF6FF6FF4FF4FF4FFFFFF00008400
    25333000002013009002011B16B31B6790002013009002011B17B10B2190000003500F

    Perhaps some more tests should be run to get an average, since other system operations of course have an effect, but it does point to the greater efficiency of SETLL, for verifying existence, against CHAIN for the same purpose.

    ReplyDelete
  3. Well maybe SETLL is "faster" than CHAIN, but the difference is very very shortly... for a single record would be around 1.5 * 10^-7 or 0.00000015 seconds! insignificant!

    ReplyDelete
  4. There is a flaw in the testing methodology.

    RPG Timestamps are only populated to thousandths of a second. So the test could only ever generate 1 in every 1,000 keys. As a result you will get the exact same value over and over again. Even if it were accurate to the millisecond the speed of the machine would still render the same values multiple times. Milliseconds are just too long for modern hardware.

    To do it properly would require a pseudo random number e.g. CEERAN0.

    Regardless - I don't think there was ever any real doubt that in most cases SETLL would outperform CHAIN. (Although if NOUNREF were specified then the difference would probably be smaller.)

    The original question concerned validation. An in my opinion a subprocedure (containing whatever method you prefer) is the only approach in a modern programs and the only one where the intent is unambiguous.

    ReplyDelete
  5. SETLL is definitely more efficient - the data is loaded into memory. The CHAIN instruction will also lock a record depending on coding so be careful you are testing like for like. If you only want to know the existence of a record use SETLL, if you actually want the data use CHAIN and when using CHAIN be careful if the file if update capable because it will lock the record unless you tell it not to. Timings will also vary depending on what else the system is doing so in a simulation with little else contending for cache the timings may suggest a closer difference between SETLL and CHAIN. On a busier system you might well get a much greater difference. These performance tricks or simply awareness of performance from a programming perspective or very important and systems scale.

    ReplyDelete
  6. Thanks, This is great!

    ReplyDelete
  7. I rarely build physical files with keys in them, having worked with ERP applications for a while; logical files built on PF accessed via CHAIN or SETLL will be another practical scenario where the performance results could show significant time difference in favor of SETLL.

    ReplyDelete
  8. I coach my programmers to code based upon desired and planned result. So every line of code should be understood and execute what is needed, not more, not less. That means that SETLL is the code unless you are purposely retrieving data. Using CHAIN because it doesn't hurt in a particular circumstances sets a bad precedent.

    ReplyDelete
    Replies
    1. Leslie - I totally agree with you. When I see code from others haphazardly using the CHAIN opcode - for all the wrong reasons - when SETLL would have been far more simpler, cleaner, and better performing, it just makes me clench my teeth.

      "Setting a bad precedent" is an understatement considering the code I come across from others before my time. It's been a primary reason why some legacy apps here dump due to record-locks.

      Delete
  9. Strictly speaking, the test is not entirely correct. The fact is that SetLL does not read the record, it only sets a pointer to the record with the required key value. While Chain finds a record by key and reads its contents.
    This is noticeable (as mentioned above) in cases where you need to check the existence of a record with a given key value. Especially where the data is stored in PF, and the key is stored separately in the LF file. Using SetLL + %Equal will be faster than Chain due to the fact that in the first case there will be no access to PF, only to LF.
    If you compare, then you need to compare Chain vs SetLL + Read - only in this case the final result (read contents of the record) will be identical.

    ReplyDelete
    Replies
    1. The test is correct. All that is being tested is that a record with the matching key is in the file.

      Delete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.