Wednesday, August 28, 2013

Validation: CHAIN versus SETLL

I recently received a message about the post No More Number Indicators. The person asked why I had used a SETLL operation code instead of a CHAIN to check for a record on a file in one of the examples.

12   setll Z1DEPT DPTMAST ;
13   if not(%equal) ;
14     ErrDept = *on ;
15   endif ;

I have to credit to my wife for this. Years ago she attended an IBM conference in San Diego, California, and she went to a presentation by John Sears about improving your code's performance. This was one of his suggestions

If you think about what the two operations code do it does make sense.

  chain Key FILE ;
  if not(%found) ;
     Error = *on ;
  endif ;

  setll Key FILE ;
  if not(%equal) ;
     Error = *on ;
  endif ;

The CHAIN retrieves a record from the file, and if successful, places the record into the input fields.

The SETLL positions the file pointer at the record it finds with a key that is equal or greater than the key. The %EQUAL built in function, BIF, is "on" when the key searched for is matched to one on the file. The input fields are not loaded.

As the SETLL does not not load the input fields, John Sears explained, it makes it faster and better to use.

The AS400 has become the IBM i, the newer servers and versions of the operating systems mean that programs and database functions run a lot faster. But the logic behind his statement is still valid.

After "digging deep" I did find a reference to this on the IBM website:

SETLL does not cause the system to access a data record. If you are only interested in verifying that a key actually exists, SETLL with an equal indicator (positions 58-59) is a better performing solution than the CHAIN operation in most cases. Under special cases of a multiple format logical file with sparse keys, CHAIN can be a faster solution than SETLL.

You can learn more about this from the IBM website:

 

This article was written for IBM i 7.1, and it should work with earlier releases too.


Sunday September 1: After all the discussion on this subject I decided to go ahead and perform my own test on which is faster, CHAIN or SETLL?

I have created a new post, CHAIN versus SETLL the results, to detail my findings.


59 comments:

  1. Very useful...thanks for sharing.

    ReplyDelete
  2. Agreed -- if it is only between CHAIN and SETLL.

    However -- unless you are "maintaining" legacy code AND you have very strict standards for preserving RPG opcodes (double whammy!) -- I will take the SQL route (select count, if exist, etc.) on this one. It is common belief that IBM stopped optimizing the RPG opcodes long time ago (probably around the same time your wife attended the conference), but they continued working on the SQL engine. So from a performance standpoint, and frankly SQL is more readable with /free code than "setll Z1DEPT DPTMAST" -- I highly recommend ditching SETLL/CHAIN in favor of SQL.

    That is my humble opinion, of course.

    ReplyDelete
  3. Performance is usually (not always) less important than other considerations such as clarity. And if (if!) IBM's current wonks are to be trusted and obeyed to the nth degree, this should be accomplished with SQL in any case.
    -kh

    ReplyDelete
  4. This is an interesting article. We (AS400 people), know that of course "speed", is relative. Performance is always limited by Input/Output, while the best written code always wins. Performance still is limited by I/O. But the SETLL is interesting in that it does not bring data into the buffer, while the traditional CHAIN does.

    ReplyDelete
  5. If I don't need the data, I always SETLL. In the early days (S/38) the performance difference was noticeable.

    ReplyDelete
  6. Venkatesan SrinivasanAugust 29, 2013 at 2:48 PM

    If the file was opened for Update or Add, then CHAIN needs the No-lock option coded. SETLL makes clear the intention of your need to just validate.

    ReplyDelete
  7. If you don't need any data from the record, then SETLL is the way to go beacuse it has no danger of changing any field values in your program. And it is a bit quicker, since it only involves the database index and doesn't require data transfer.

    ReplyDelete
  8. if you are only checking for existence then SETLL will be faster and will not retrieve the record but you're talking nano-seconds but as a general rule of thumb i use SETLL to check for existence if I'm not actually retrieving the data.

    ReplyDelete
  9. Use the SETLL, it is worth it based on the speed aspect alone.

    ReplyDelete
  10. I had always heard that SETLL was faster than CHAIN as well, but a while back we did a little testing and we didn't find a difference in our tests. Now, our tests were just informal - by no means under controlled conditions - no formal benchmarking procedures were followed and I'm by no means claiming these as actual benchmark results. Even so, our results led us to the belief that if there is a performance difference it must be pretty small - probably not worth losing any sleep over. We probably didn't have the file declared as update-capable in our tests - I'd think that locking might have caused more of a noticeable difference. Of course, we could have always done CHAIN(N) anyway.

    Logically, it seems there should be a difference. And I would still do SETLL, %Equal in those cases anyway - if only because it makes more sense to those following the logic afterwards and to avoid populating fields I don't intend to use in the program. It just makes sense to me, regardless of performance differences that may (or may not) exist.

    If anyone has seen evidence that SETLL actually performs faster than CHAIN I'd love to hear about it. I was very surprised at our results.

    ReplyDelete
    Replies
    1. I cannot understand the pages and pages of debate. SETLL is faster...CHAIN puts the data in the input buffer....period. What is so difficult to understand??

      Delete
  11. I agree with Steve and I have used SETLL for validating if a record is in the file for as long as I can remember.

    ReplyDelete
  12. Also, the SETLL is important when creating a 3=Copy function.
    bring into the program the original record,
    load up the key fields changing what you need to change
    Before writing, use SETLL to ensure that noone has snuck in the record with these keys
    If it is not found ( using not %Equal after the Setll) you can now safely write the record.

    If you were to use CHAIN instead of SETLL in the step above, the behvaior of the CHAIN will replace the key-values you just loaded..

    ReplyDelete
  13. Alvaro Roberto Meoño WongAugust 29, 2013 at 2:51 PM

    Yo he utilizado mucho el setll porque es mas rapido para una cantidad de registros a realizar la busqueda, ya que se situa en el puntero directo del registro a buscar, mientras con el chain si es pequeño el archivo es poca la diferencia. Lo importante es mejor trabajar con el Setll. Yo lo he trabajado con el RPGII,III y en ambiente nativo

    ReplyDelete
    Replies
    1. Translation (using Google Translate so it is not perfect)

      I have used much because it is faster SETLL for a number of records to perform the search, because it is located in the direct pointer record to find, while the chain if the file is small there is little difference. The important thing is better to work with the SETLL. I've worked with RPGII, III and native environment

      Delete
  14. I like SETLL as record locks don't play into the picture and it's easy to start reading from that point if desired. While if reading MASSIVE amounts of data performance can be a consideration but with todays machines the difference between a CHAIN and SETLL is so small anymore I don't know that I would use performance as the full determination which is better.... but even so SETLL should be faster, less issues with record locks (yes I know you can avoid them with CHAIN as well) and is a good launching pad for I/O if deemed necessary.

    ReplyDelete
  15. I have always followed what Tommy suggests. I was told when I first started learning RPG that SETLL was more efficient when just checking for existence of a record.

    ReplyDelete
  16. To experience SETLL is slightly faster but must then be followed by a READ CHAIN ??is a unique education logically less then write instructions faster the calculation.

    ReplyDelete
    Replies
    1. You don't have to follow a SETLL by a READ to validate the record exists!

      Delete
  17. Yes, in the given situation, SETLL is more efficient for 2 reasons - first is the speed with which file pointer can get to the reqd record. SETLL merely sets the indicator on, chain will bring into buffer memory or program variables all the field info of the found record - at times such info may overwrite the values of the variables when the same is not intended. Hence SETLL would save on memory and AVOID undesired overwrite errors.

    ReplyDelete
  18. SETLL is inherently faster, as noted by others.

    Unfortunately, due to design changes sometimes you need to go ahead and retrieve the record to get some field values, and when /that/ change is made, you need to do volume testing in addition to the regular testing on the program to determine how much it will cost you in terms of runtime - been there...

    ReplyDelete
  19. If the record is only accessed for existence, then a SETLL is less expensive then a CHAIN. Always better to reset the cursor and test an indicator. This may not matter for small amounts of I/O but for a job that needs to do this many times, the time savings can add up fast. Simon, I like using *in42 ;-)

    ReplyDelete
  20. SETLL is most often far less expensive than CHAIN. I know of one exception. A number of years ago in my previous employment, we found out that doing a SETLL on a logical select/omit file could be very very costly, far slower than CHAIN. If I remember correctly, most of the rows were excluded by the select/omit.

    ReplyDelete
  21. This is a no brainier, SETLL is always the best method to simply validate that a record exists in a files. Programs should be designed to maximize functionality. They should also be designed to be used by foreign application as well as local ones. I have spend over 25 years designing application in RPG, RPG/400, & RPGLE the SETLL method insures that variable values in your program do not get modified during a simple record validation. Once you have validated the existence of a record, you can then move on to process the request as needed.

    ReplyDelete
  22. Above is a great explanation covering everything to refresh some memory.

    ReplyDelete
  23. SETLL is even more efficient when you are dealing with logical files and indexes. With these file types, a chain would need to position to the search argument in the logical file, retrieve the location of the record in the physical file, position to the record in the physical, then move the data record into the IO buffer. A SETLL would simply need to position to the search argument in the logical.

    Another bonus is that, when dealing with locally scoped files, the SETLL does not require an IO data structure.

    ReplyDelete
  24. Absolutly CHAIN is much better.

    ReplyDelete
    Replies
    1. It would be helpful, Manoj, if you would give your reasons why you consider CHAIN to be better than SETLL for record existence validation.

      For myself, I would use SETLL for the reasons given by others here. Especially in a complex program where use is made of data from other records in the same file. However, another approach when you are routinely validating a record exists in a 'standing data' table (for example, country codes) would be to have a utility program or service program which contains all the standing data tables. A call or procedure call would then be made from your program, giving the table ID, keys and method (get or check). The procedure or program then returns a code to indicate if the record exists (check method), and the record itself in a record format data structure (get method).

      Using this utility, you don't need to access lots of table files in each program you write. And particular files only get opened in the utility program when a request is made.

      We do this in my company, using a batch program to handle the tables and a data queue pair to communicate between the application program and the batch job. In addition, there is an enquiry window system to handle '?' requests in key fields on the call. But that is going beyond the scope of this discussion.

      Delete
    2. Venkatesan SrinivasanAugust 30, 2013 at 6:07 PM

      I think Manoj meant "you get more out of CHAIN, so it is better".

      I use CHAIN only when I know for sure there is only one record for the keys supplied and I need information retrieved for that row for further use.

      Delete
  25. I believe both commands have been created for different purposes, perhaps its functions are similar, but SETLL is to approximate a pointer to read records under a criterion, and chain is to locate records randomly.

    I can use a knife to screw, but the knife is to cut.

    I don't believe that SETLL be most express than setll....

    Anyway, everyone does a program according to his style...

    ReplyDelete

  26. Speed should really not be the gating factor here - clarity should be.

    I personally would always use CHAIN because I think it makes the intent of the code more obvious. A SETLL by definition is intended to position the cursor. The fact that it can detect an exact match is incidental to its primary use. When I see a SETLL I assume that the programmer is positioning for a READ sequence - when I see CHAIN I assume that the existence of a record is being checked. Which of the two scenarios best matches the OP's "validate" requirement? For me the choice is obvious.

    P.S. While in theory a SETLL should always be faster, I tested it some years ago on a regular database (not multi-format) and found that in several tests CHAIN was faster. Go figure.

    ReplyDelete
    Replies
    1. I agree with Jon. In an I/O bound application, any difference in performance between CHAIN and SETLL is minor. Code for correctness and clarity first.

      Delete
    2. In the case of validating, let's say a state code, why would I CHAIN? I don't need the description of the state. I just need to know that CA is a valid code.

      I use the SETLL to be like documentation in the code to say to the other people who encounter the code, that I do not need the values in the record. I am just checking if it is there on the file. While with a CHAIN I am making the conscious effort to retrieve the values from the file to use them.

      Delete
    3. So if you have already made up your mind why did you ask the question?

      If you really want to make the intent unambiguously obvious then you should code a subprocedure to validate such items. That way the code reads:

      If ValidStateCode(....);

      And then you can use use an array lookup, SETLL, CHAIN, SQL, a web service, etc. etc. for the actual retrieval and it doesn't matter. The fact that no data is being retrieved is also obvious. You can also change the method later with zero impact (for example does it really make any sense at all to have a State table? Been a long time since any were added.)

      With regard to "... need the values ..." I'm not really sure why when looking at the code they would care. Also your SETLL convention would be of no help if I was reading your code because I would assume you were starting a list. I don't think it would ever cross my mind that you were doing a validation but didn't need the data.

      Delete
    4. Jon,

      So the code is now obvious with the use of the subprocedure. We have clarity, now its back to speed.

      Scott

      Delete
  27. This is ultimately one of those questions really that is "it depends" - if there's a strong possibility I will need the data anyway I'll use the CHAIN - if it's strictly for verification of the data (user wants to ADD a record .... verify it doesn't exist first type of thing) I use SETLL - why read in data if you're not going to use it? Why use SETLL if you're going to use the data anyway? In otherwords - the goal dictates which is more logical to use.

    Given the stated criteria - record validation... SETLL makes sense - why have the system retrieve data that is not going to be used? That's kind of like driving to the grocery store to see if there's milk, take it home and say "Yup, they have milk" when you can simply call the store and ask...

    ReplyDelete
  28. I am a self-taught RPG programmer and I have learned more from this short discussion than I can say. I always "knew" the difference, but I never considered the ramifications of the choice... thank you all for enlightening me...

    ReplyDelete
  29. For new development I prefer SQL SELECT rather than native IO, the list of advantages for using SQE over CQE is long. But you'll find many angles on the tip of a pin.

    ReplyDelete
    Replies
    1. Hugh, what would you suggest as an SQL alternative? I'd probably go with SELECT COUNT(*) INTO :something FROM ... and then check whether "something" was > 0. That should preserve SETLL's advantage of not transferring data (under the assumption that only confirmation of a row's existence is required and not any data from it, which I think was the spirit of the original question).

      Delete
    2. As an existence check, what is the point of replacing a one-line SETLL with a piece of embedded SQL which potentially could count a large number of records, just so that you can avoid using native IO?

      I'm not antagonistic towards embedded SQL - I use it myself. But as a replacement for a record existence check using SETLL? You would have to hate native IO with a vengeance to even consider using that. And as for the SQL solution being more efficient, I would like to see a benchmark test like Simon's CHAIN/SETLL test applied to this SETLL/SELECT COUNT(*) argument.

      We do what we can to make our code correct, efficient and maintainable. The premise of this discussion was whether SETLL was more efficient than CHAIN if all you wanted to do was to check the existence of a record. It is more efficient, especially when the record does not exist. So in essence the remit has been discharged.

      Any other discussions - for example, whether to use CHAIN or SETLL to position for later reads - are not relevant, but may still be interesting.

      Delete
  30. I think SETLL is better, because by using CHAIN other fields of record format get changed.
    And in SETLL noting is changed, and one more thing we also can use SETGT in lieu of SETLL.

    ReplyDelete
    Replies
    1. Then use DS I/O for the CHAIN. Think modern RPG - not the old monolithic approach.

      Delete
  31. I use one or more function in a SRVPGM that return if record record exists (or other stuff too) for whatever file and for proper key field, for example:
    Input parameters
    Library name
    File name
    Key field (usually UID or GUID)
    Output param
    -1 not exists / 0 exists

    The SRVPGM uses SQL statement.

    ReplyDelete
  32. To add my $.02 to that discussion: at the end it doesn't mater which one is better. What matters is which approach is the best for the application.
    SETLL, CHAIN, SELECT COUNT, SeviceProgram, stored procedures... who cares?
    Sure, if you work with LAWSON/infor, you are stuck with procedure calls. If you believe SQL is the best thing since sliced bread, go with SELECT COUNT.
    As far as I am concerned, there is no "best practice", there is a "best approach" for that project.

    ReplyDelete
  33. I can provide the source of a multi-format database file on request.
    SETLL gives better performance, but if you want the actual information on file, you will use CHAIN. Despite the better performance of SETLL, there CHAIN does have the following advantage. Given that you will use a display file to show records that are not found, you can use the same indicator for the "Not Found" indicator for CHAIN as the RI attribute of the display field in the display file.

    ReplyDelete
  34. you can see a CHAIN as a combination of SETLL and READE ... I assumme when a record doesn't exist, CHAIN or SETLL would even equally fast. When records are found, CHAIN would be slower, as it retrieves the values of the found record.
    I prefer SETLL for test existance, and in my experience it's much much faster...
    Just try a million times CHAINING a existing record, versus SETLL an existing record...

    ReplyDelete
  35. There are very few occasions when I do not want something from the record. I use CHAIN as it is clear and makes the fields available if required. I like to use prefixes on files to prevent making a mess of existing field values. I would only consider using SETLL in cases where it is executed many times and would make a significant difference. I have no doubt that SETLL is faster, and would be interested to learn whether it reads the data at all, or simply determines whether it exists in the index.

    ReplyDelete
  36. SETLL may be also used to initially determine that one record or more of a set is available, and then sometime later do a READE to start retrieving actual data, if that's what is required.

    For an interactive program, this technique may be used to get a /feedback panel back to the user ASAP while the program goes on to pull data and compose a substantive response and send a followup panel. This arrangement decreases apparent response time as perceived by users.

    ReplyDelete
  37. There are very few occasions when I do not want something from the record. I use CHAIN as it is clear and makes the fields available if required. I like to use prefixes on files to prevent making a mess of existing field values. I would only consider using SETLL in cases where it is executed many times and would make a significant difference. I have no doubt that SETLL is faster, and would be interested to learn whether it reads the data at all, or simply determines whether it exists in the index.

    ReplyDelete
  38. I used to always ask this question when interviewing programmers and always got some interesting answers. To simply check the existence of a record, a SETLL will set the file pointer at the appropriate record and will then allow the programmer to determine if the records exists. If you need to return some data then you will either use the CHAIN (which is effectively a Setll and a Reade combined) or simply use the Reade after the Setll.

    ReplyDelete
  39. I suspect SETLL is likely to perform worse than CHAIN when you have a sparse select/omit file with DYNSLT specified. In that case, SETLL has to read forward in the file to find the next record so it can set %Found() properly, which could involve skipping a lot of omitted records.

    ReplyDelete
  40. I have always used CHAIN. Perhaps not the best of choice. But for smaller files it didn't really matter. Now if this were large files I might re-evaluate the decisions.

    ReplyDelete
  41. Rocky Marquiss and others that have hopped into this simple discussion on SETLL and CHAIN, to help inform Newbies of the performance trade offs and when to use which based on the merits of the situation, have been answered. In closing, any design where interactive use of data and locking protocol can be handled more elegantly with never-ending programs with exclusive locks on files in batch subsystems working with data queues, et cetera. It is not a matter of one language being better than another but which works best performance, maintenance, support, and growth-wise, ease of use with an ROI for all parties to win.

    We all know that ILE on the IBM power systems, or what ever platform it reinvents itself on was object oriented before the industry made that a buzz word. It is our goal to provide service, education and ease for services to be viewed as sound and agile. It is too often we see language wars that go nowhere and discussions on performance that borders on being crazy to our readers, but as Rocky pointed out if you can not maintain or read the documentation well enough to use or reuse or enhance it, it becomes obfuscated.

    I still have code working and programmers thanking me for documenting my work. Keep up the great work in your jobs, and make the people you meet think how neat you gave them what they wanted without a whole lot of hassle. We as techies, can admire the machinery and the processes, but out customers want the job done well, quickly, and at a reasonable price. Sadly, we don't do so great a job at showing the total cost of ownership is better than the competition or we lapse into computerese and lose our audience. Take time to sharpen your saw and perhaps join Toastmasters to improve yourself as well.

    ReplyDelete
  42. Is there any performance difference between 'SETLL followed by READ' and 'CHAIN' ?

    BTW, i am beginner in RPG, and i never knew that i can use SETLL followed by %equal to check if a record exists. this really helps in avoiding bugs

    ReplyDelete
  43. This is an interesting discussion and the use of CHAIN, SETLL or even SQL stmt in an RPG program is largely dependent on the overall logic and process. Interactive, batch, repetitive/sequential logic, single/multiple records, record locks, response time, wait time, single/multiple use and shop programming standards are some of the considerations. Unless you can physically count single CPU cycles, I doubt if anyone sees the difference. I only object to someone dictating to me how to code based on personal preferences instead of a logical explanation. I once received a disciplinary write-up by my manager b/c I wrote code in RPGIV. His reasoning was because it looked 'foreign' to the other two senior programmers who favored RPGIII and resisted the change. Of course, the manager could not write code if his life depended on it.

    ReplyDelete
  44. John Sears' Design for Performance lectures are timelessly invaluable for every IT professional regardless of your concentration. His background in operating systems, database design, programming languages and computer science history was evident in every lecture, discussion panel and informal conversation. His major emphasis was that software performance and security are most effective when addressed in the initial design instead as an afterthought.

    ReplyDelete
  45. If I am validating an Item and need its description I use Chain. But if all I need to know is if the item exists I always use SETLL and test the %Equal It does not bring in any data to the buffers and is much more efficient than a chain.

    ReplyDelete
  46. Limiting any batch program which needs only to do 'record exists' tests to using SETLL really does speed it up for large datasets. For interactive programs or very small datasets, performance change between SETLL and CHAIN may be negligible.

    Actually fetching a record into program space when it's unnecessary to do so also enables data cross-contamination. Especially for 1st normal form data coming from an exterior source.

    ReplyDelete
  47. I'm working on a huge system having 15+ millions transactions per day and have billions of records in the tables. hence count every file I/Os added on the process. with this view,
    During a Chain operation, if "Record is not found" then will it still do the actual I/O on the file? does this counts to the file I/O?

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.