Tuesday, September 3, 2013

Equivalent of SETLL using SQL

I received several messages and comments in response to my post Validation: CHAIN versus SETLL informing me that I can do the same in SQL.

Alas, no-one sent me any examples of how they would do it. Therefore, after some Googling I found a way.

In this example I will be using I will perform two "SETLLs" to a file, ORDFILE, looking for a match to the Order Number field, ORDNBR.

In the first scenario I used a valid Order Number and in the second an invalid number.

01 D OrdNo1          S              7    inz('MSFH540')
02 D OrdNo2          S                   like(OrdNo1) inz('???????')
03 D i               S              5I 0
04  /free
05     exec sql select 1 into :i
06        from ordfile where ordnbr = :OrdNo1 ;

07     exec sql select 1 into :i
08         from ordfile where ordnbr = :OrdNo2 ;

Lines 1 and 2 define the work fields I will be using for the test order numbers.

On line 3 I have defined an integar field, i, that will be used to flag whether the "SETLL" was successful or not.

Lines 5 and 6 are the SQL statement that performs the look up matching the value in the field OrdNo1 to the field ORDNBR, which is successful and 1 is moved to i.

Lines 7 and 8 perform an unsuccessful match. As there is not a record in the ORDFILE with a key that matches, and no value is moved to i. If I check if i is equal to 1 to test if the match was successful then I would get a false positive, as i remains unchanged and equal to 1.

To overcome the possibility of false positives I would insert two new lines to move zero to i before performing the SQL statement, see below:

01 D OrdNo1          S              7    inz('MSFH540')
02 D OrdNo2          S                   like(OrdNo1) inz('???????')
03 D i               S              5I 0
04  /free
05     i = 0 ;
06     exec sql select 1 into :i
07        from ordfile where ordnbr = :OrdNo1 ;

08     i = 0 ;
09     exec sql select 1 into :i
10         from ordfile where ordnbr = :OrdNo2 ;

Now when the unsuccessful "SETLL" is performed i is equal to zero.

Do you know a better way? If so add a Comment, below, describing how you would do it.

 

This article was written for IBM i 7.1, and it should work with earlier releases too.

38 comments:

  1. Thanks for this post

    ReplyDelete
  2. (John Blenkinsop, via LinkedIn AS400 Specialists Group)

    I'm running a test of the CHAIN, SETLL and SQL methods at the moment on our 'test' machine. All methods are very similar to your blog examples. Test makes one million iterations over 500,000 records so that half are hits.

    Preliminary results of the first set (of ten) show :

    CHAIN 42,280,000
    SETLL 39,069,000
    SELECT 264,819,000

    Values are milliseconds.

    The SQL method is far slower, the program is 48KB bigger, and in my opinion the code is much more obscure. I'm all for SQL in its place, but its place is not record existence validation in RPG programs.

    ReplyDelete
  3. We can also use
    select count(*) into :i from where

    and after this statement we can check the value of i.

    if i > 0, that means record exist in file for the mentioned condition in where clause

    ReplyDelete
    Replies
    1. If you’re using embedded sql you can check SQLSTATE after executing the query. If a record is found SQLSTATE is set to ‘00000’. However you may get an error if the query returns more than one record, therefore I’d recommend something like the following…
      select count(*) into :i
      from sysibm.sysdummy1
      where exists(select ordnbr from ordfile where ordnbr = :OrdNo2);

      This will set i to 0 if record is not found or to 1 if the record exists. I wouldn’t recommend just selecting count(*) as this may be an issue for large files.

      Delete
    2. Be careful with checking SQLSTATE = '00000' or SQLCODE = *Zeros. A warning may be returned (SQLCODE > *Zeros and <> 1000 / SQLSTATE Position 1-2 = '01'), i.e. at least one row is available.

      For ensuring only a single row is returned I'd NOT count all records!
      The following statement is faster:
      Clear IsFound;
      Exec SQL Select 1 into :IsFound
      From YourFile
      Where ...
      Fetch First Row Only;
      If a row is found IsFound is Set To 1.

      Delete
  4. I have often used "SELECT COUNT(*) INTO :MYVAR WHERE...", and then tested whether MYVAR was greater than zero. The method you described is probably more efficient.

    I wonder if the precompiler would tolerate "SELECT '1' INTO :MYVAR WHERE...", if MYVAR were an indicator variable. That might help the subsequent logic read a little better.

    ReplyDelete
    Replies
    1. Yes you can. It would be "Select Coalesce('1','0') into :MYVAR From file Where ... fetch first row only;". The coalesce returns '1' if found and '0' if returns null "Not Found".

      Delete
  5. I've used select count(*) in the past but I think you can do something with a "where exists" clause that would ensure the query returns after finding first occurrence of the record. The main advantage to thinking in terms of SQL is not necessarily the bench mark speed of every query. It’s more about all the flexibility of the DDL and DML statements and all the enhancements IBM continues to make to the SQL query engine. I certainly don’t hate native I/O, it’s probably still better for some applications. Also, I wouldn’t suggest replacing or rewriting things that work. If it’s not broken, then don’t fix it.

    ReplyDelete
  6. (John Blenkinsop, via LinkedIn AS400 Specialists Group)

    I've tested Hugo Cantor's method (COUNT) against Simon's select into method (SELECT). Hugo's method is below:

    select count(*) into :i
    from sysibm.sysdummy1
    where exists(select key from testfile where key = :keyfld);

    The times for the tests (I only did two of each, interleaved in a batch job) are:

    SELECT 263,640,000
    COUNT 437,418,000
    SELECT 263,214,000
    COUNT 440,169,000

    Note that the COUNT method is almost twice as slow. I noted that SQL methods opened more than one SQL view of the data file. In the COUNT method, the sysdummy1 file was, of course, also opened, and had 2 million IOs for 1 million from one open of the test file. This would contribute to the increased run time.

    Despite Hugh Brady's optimism regarding the flexibility of DDL and DDM, and the continuing enhancement of the engine, you have to at some point look at what is the best tool to use for the job in hand, and to use embedded SQL just to check a record's existence is slow, unintuitive and vastly inefficient.

    ReplyDelete
    Replies
    1. Do you have an index or EVI over the key column in the testfile table? It almost sounds like the sql is doing a full table scan.

      What OS version are you running? We noticed that upgrading from V5R4 to 7.1 gave us a huge performance boost for SQL statements.

      Also as mentioned below your sql should look something like this. But I would change the '1' to a 1 since I like working with numbers more than working with strings.

      select 1 into :i
      from sysibm.sysdummy1
      where exists(select key from testfile where key = :keyfld);

      Delete
    2. John,

      Why would you use COUNT(*) when you are already using EXISTS? This is overkill, and inefficient. Make sure you have an index that includes only your key field but nothing else. Try this:


      VALUES CASE WHEN EXISTS(SELECT key FROM testfile WHERE key = :keyfld)
      THEN 'Y' ELSE 'N' END
      INTO :flg

      Delete
  7. Any time you are using count(*) you must process the entire file/table to get the answer. I prefer the following for sql:

    select '1' into :NamedRecordFoundIndicator
    from sysibm.sysdummy1
    where exists(select key from testfile where key = :keyfld);

    ReplyDelete
  8. When I was first learning SQL, I was told that using "select count(1) ....." instead of "Select count(*) ...." was more efficient for validation checking, because using the wildcard "*" retrieved all the fields where using a number simply counted the occurences of the records, possibly from the indexes if present.
    I have never been able to verify that with statistics, however.

    ReplyDelete
    Replies
    1. Even better I used "select count(0) ...."

      Delete
  9. I feel like John is missing the point of the SQL argument and inventing his own counter argument. SQL is better because it’s portable, you can run the same query on any database that conforms to the SQL standard. It’s easier to do unit testing because you can run your queries interactively. It’s easier to connect to remote partitions and systems (potentially non-DB2) using SQL-connect, JDBC and so forth. It has better support for transaction control. It’s more human readable, especially to non-IT folk. It supports relational integrity constraints within the DBMS. It has more advanced indexing options like encoded vector indexes. Most importantly, it allows for a loose coupling between the data definition layer (tables), the data presentation layer (views, procedures and functions) and your application layer (queries and code). You might have the fastest native I/O program in the world but you’ll still need to modify and recompile it every time someone wants to add a column to your database file. Again, the point is not to replace native file operations with SQL and then have a pissing contest. The point of SQL is to add an additional layer of abstraction between any programing language (RPG, C, and Java) and its data source. I realize it’s too often abused and I think that’s the underlying cause for a lot of misperception.

    This trivial and contrived performance debate between SETLL and CHAIN is utterly moot. I’ve made a good portion of my living fixing performance issues in System i environments for the last decade, and never once has the solution to a performance related problem been to use SETLL in place of CHAIN.

    ReplyDelete
    Replies
    1. I completely agree with Hugh. I use SQL myself all the time, instead of the other IBM Query tools - it does help that I have a very flexible front-end to IBM SQL here.

      And I use embedded SQL where appropriate, as well as other SQL products such as MSSQL and COGNOS. I'm not knocking SQL in its proper useage.

      But in the context of this discussion it IS a pissing contest to put forward an SQL statement as an alternative to native opcodes in a native program. If this was a discussion about SQL, then let's look at how most efficiently to perform an existence check within a larger SQL program - it's a very important topic. We've got several large SQL scripts in Showcase here where existence checking is required, and I don't quite trust the Showcase IDE to come up with the best solutions.

      The SETLL - CHAIN debate IS moot. And changing CHAINs to SETLLs would not give noticeable performance benefits. It is a subject that always comes up in RPG discussions, along with the GOTO debate, and is of course great fun for those who enjoy it. But yes, it is not relevant except in what is says about any particular contributor's philosophy or method of working.

      Delete
  10. In SQL scripts (stand-alone or embedded in RPG) I use the SQL EXISTS clause for checking the presence or absence of a key value.

    In RPG I almost always use SETLL with the %EQUAL BIF. I use the EXISTS in a SQL statement for record selection, where the record selection from one table depends on whether or not something is found in another table.

    I would think that having SQL count all records would be more time consuming than just seeing if a key value exists, either with EXISTS or SETLL and %EQUAL.

    ReplyDelete
  11. Without testing I'd suspect there's still a performance hit with aggregate function (COUNT) over just inserting a value - in otherwords I'd replace select count(*) into :i with 1 into :i - it's a simple assignment.

    However - you probably noticed I did prefix my statement with 'IF I were to use SQL...' - I'm a believer of using the right tool for the job - IF the right tool is SQL - use it. IF the right tool is native I/O (SETLL, CHAIN, etc) - use it. We all too often try to use pliers when a hammer is called for...

    ReplyDelete
    Replies
    1. Yes, that's exactly what happens with the COUNT(*) method I tested. And the SELECT method I tested it against is similar to your SELECT 1 into :i above. The results showed that COUNT is nearly twice as slow as SELECT 1 INTO :i.

      And you did say 'if' - sorry if it seemed I was targetting you specifically. Especially foolish if I was to pick a fight with someone called Rocky!

      Delete
    2. Those who promote SQL over native argue that the issue of level check goes away - which would only be true if you stay completely away from native I/O - IF that's your goal I'd use the assignment, not the aggregate - it certainly won't perform worse and very possible it will perform better.

      I'm not a proponent of using strictly SQL - just giving a different perspective.

      Delete
    3. Rocky, I am sure John Blenkinsop can speak for himself, old hats do understand that the new tricks may not perform as well as the old ones. SETLL was designed for performance on the System/38, with the succeeding OS releases, now with ILE, ILE RPG can out perform an SQL existence check for a record, unless SQL has changed and improved itself lately as RPG has. Computer languages each have foundational approaches to which they will not inherently stray from, so though the purist SQL types might not venture to rewrite themselves the hybrids might tack on ILE RPG to assimilate that feature. Does anyone know where to support the performance statistics on this? There are basicly two tests just over larger sample populations to contend with to verify this, not counting a hybrid solution which would taint the findings.

      Delete
    4. To be blunt - much of the discussion around performance borders on being crazy. If it's an interactive job and checking for an existance of a record the difference is so minute that the user would never see the difference between the two. If it's a batrch job transferring millions of records you'll start seeing the difference... To me it's more of an issue of maintainability & readability - SETLL is easy, straightforward and obvious to what the goal is - the SQL isn't as straightforward and not as intuitive.

      In otherwords - it's not that it can't be done in SQL - it's whether it's desirable to do so. I personally prefer SETLL for the stated purpose of verifying that data exists....

      But then again - I prefer standard I/O when dealing with standard record level processing - SQL when dealing with aggregate functionality or when flexibility of data is needed (allowing the data to be presented by Inv # or description order based on user selection - or need total $$$ of an invoice, etc...)

      Delete
    5. Unless my arithmetic is out or I have missed a comment as I browsed this thread quickly while waiting for a reboot of another PC, nobody has highlighted the point that the tests carried out indicate that checking the existence of (or reading) one record in SQL takes a quarter of a second, as opposed to 40 milliseconds using SETLL/CHAIN.

      A quarter of a second is an eternity in today's IT world. Although Rocky's comment is valid, since discussions like this one are invariably more about schools of thought than a demonstrable, significant benefit of one method over another, in this case, I think it just might be an important debate, from which we all need a definitive conclusion.

      The first thing that I would want to explore is the validity of the test... I just cannot imagine that this discussion would even be taking place, if SQL really is that slow. Someone has asked whether there was a suitable index for SQL to use and what OS release was in use. Those are the 64,000 dollar questions. Without a reply from the tester, we don't know if his performance tests have any validity at all. But, for now, let's assume the results are correct.

      In my experience, most high-volume transaction programs have at least 5 existence check validations. The test would indicate that using SQL instead of SETLL/CHAIN would add about a second to the response time. The results further indicate that it could add 4 seconds to the response time of a very complex transaction program with twenty existence check validations.

      It would add nearly three hours to the run time of a batch program that processes 10,000 records each with 5 secondary accesses. OK - I know this example isn't a good one, because we could often use a single SQL statement to read all the records needed for the processing. However, I am just trying to bring to everybody's attention that these timing differences are very significant - 10,000 is hardly high-volume and 10,000 seconds is nearly 3 hours.

      In the final event, this debate should come down to pragmatism, shouldn't it? In one way or another, all of us serve masters who share one common goal: the efficiency and profitability of the company they serve. That must our primary concern when selecting best practices, not the particular "school of thought" to which we belong.

      The portability argument is sound and I lean that way myself these days - but NOT if the performance difference is as great as these results indicate - that is way too high a cost to pay.

      This is too important to be reduced to an argument between the traditionalists and modernists camps. To me, these timing differences seem to be so significant that we have a responsibility to come up with unbiased results that clearly indicate whether using SQL for single record access might just be too high a performance hit to be practical.

      Delete
  12. i have use EXISTS, that is much faster in SQL, as this test if there will be record or not.
    SETLL does the same thing, it return if there will be record or not.
    If you want i can provide you some generic statement, where i had in my project.

    ReplyDelete
  13. In my experience, SQL, CHAIN, SETLL each has its place depending upon the environment and accessed files. Having also benchmarked each method for use in a high throughput online environment during benchmark testing with IBM, I found native IO calls faster (SETLL and then CHAIN) when checking existence for a single file record. The data suggested that the overhead associated with SQL makes it a better fit for situations involving simultaneous multiple file access/queries.

    ReplyDelete
  14. Simon, you may offend the SQL purist by sandwiching the SQL within RPGLE. You are a sly fox, in that you already know the answer will performance wise be a no-contest. It is like pitting RAM against a Hard Drive for retrieval, fundamentally from an architectural view point. However, if you want bees, just stir up the hive.

    ReplyDelete
  15. I would think a combination of a WHERE clause and ORDER BY in descending order.

    This presents a situation not easily handled in SQL, seems to me. I've worked with this kind of thing in writing an Open Access handler for RLA opcodes. I'd not considered the SETLL/READPE combination - only SETLL/READP and SETGT/READPE.

    Problem is, how do you know which direction you are going?

    Maybe another answer is a brute force method - prepare the SELECT and open it, then FETCH from start until you get a match where you are currently, then FETCH back - or save the record at each level - YUK! So much IO!

    Ideas! Ideas! Ideas!

    ReplyDelete
  16. Federico Cambero FenoySeptember 9, 2013 at 9:30 AM

    I believe that programming should be clean, streamlined and robust.
    and I believe that in these cases using embedded sql is not the best option as the language itself gives you the solution.

    ReplyDelete
  17. I have followed this thread with significant interest. When I was first developing the Inuendo open source (http://inuendo.us), one of the concerns I had with the GET functions was performance in a very high volume environment, because an SQL SELECT INTO was being used instead of a SETGT/READPE for time index sensitive searches.

    The reason I took that approach was so porting the open source code to other platforms would be simpler, and so non-RPG programmers who wanted to use the same techniques in other languages would have an easier time.

    This conversation has enticed me to do a high volume benchmark test when I can carve out some time. Stay tuned.

    ReplyDelete
  18. I know I am a bit late for this debate, but yesterday I had to use SQL to check if exists rather than the traditional setll, because the files in questions did not have any keys set up for the field I was checking and going in a traditional way would require me to read these files record by record from beginning to end. SQL did the job in two coding lines.

    ReplyDelete
  19. It is amazing to see the various solutions provided within this post. Whether you use SQL or RPG, a lock is issued against the record (if one exist) before you can delete it. Birgitta is also correct in her statement that you must be careful and what some are suggesting with the use of SQL to get a count is madness for performance.

    Imagine reading a 10 million record file to find how many records fit your criteria and then asking the system to do it again to perform the deletion.

    This code will allow you to avoid doing a read prior to your delete clause. (RPG Free)

    Setll (Key) FIlename;

    dou %eof;
    Delete (key) filename;
    EndDo;

    Please note that your test for deletion must be the key of the file. Meaning, If you have to insert an IF clause to that logic. You must read, test then perform the delete. This makes your question mute and using a SQL may be more efficient.

    SQL optimization is based on the key access paths of views (Logicals) and parent (Physical) files. When you use a SQL fetch it locks the row (record) to allow the deletion.
    I'm not sure what benefit is gained in performance by determine if a record exist prior to executing a Delete from file where statement. The SQLSTATE after the execute clause will tell you whether it did exist and only incurs the performance hit once instead of twice.

    Hope this helps.

    ReplyDelete
    Replies
    1. According to the manual %EOF is set by READ, READC, READE, READP, READPE and WRITE (subfile only). No mention of DELETE. Assuming that the manual is correct, your code will most probably loop indefinitely.

      Delete
    2. Sébastien JulliandNovember 26, 2013 at 12:06 PM

      Although it is intellectually interesting to try to find an SQL statement that could replace the SETLL operation, I don't see the point. By doing this, you will just end with a messed up code that won't perform as well as it did. And soon you will here someone yelling "I told you! SQL is slow compared to the good old record-level access"...

      In order to replace a record-level access by embedded SQL, you must take the "whole picture" into account. The example given by Wayne could easily be replaced by a single Delete statement, with no performance loss (assuming you have the right index). You wouldn't keep this Do-Until loop, would you? :)

      Delete
    3. Joep,
      Your statement is correct. The comment was from memory and not the manual. .
      My bad. It's probably a good idea to add the %error qualifier as well and a monitor statement to handle it.

      Correction:
      dou Not %found;
      Delete key Filename;
      EndDo;

      And the manual for reference:
      This information can also be obtained from the %FOUND built-in function, which returns '0' if no record is found, and '1' if a record is found.

      As always you should test any code you create!

      Delete
    4. Delete(e) will set %found(). It does not affect %eof().

      Delete
  20. Wrong. If You run the example SQL twice, one that finds the record, and the second that does not find the record, It still keeps a 1 in variable i.

    Better use SQLCODE. 100 = Not found. 0 = Found, a number with a minus means the SQL failed.

    Besidest that: using I as a variable is not a good proctice. If You have a big source it will be hard to serach for it. Why not use a more meaningful variable: Fund, or even better FoundMyfile.

    ReplyDelete
    Replies
    1. You are correct in your first paragraph that if you run the SQL statement one after another without resetting i=0 the 1 is retained. I thought I made that clear in my example, when describing what lines 7 and 8 do.

      You could use the SQLCOD if you so desire. There are many ways to achieve the same end.

      As for the variable "i", you do not see full programs, therefore, you do not see that I define "i" in the same procedure as the SQL statement. As I handle the result immediately after the SELECT "i" is as good as any other name. If I was to use the value in another procedure or part of the program I would give it a more meaningful name.

      Then again, the code here is all examples just to illustrate a method of doing things.

      Delete
  21. i almost never use setll and instead i use the sql way
    at least when performance isn't the main issue (and it's not 95% of the times)
    main reason for me is that if i use sql i don't have to declare the file in the F spec and i don't have to remember what logical file have they key i need
    sql is just more comfort to use

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.