Wednesday, June 10, 2015

Using control blocking to improve database performance

ovrdbf control blocks to speed up copy files

IBM i 7.1 TR7 saw an addition to the "Override with data base file" command, OVRDBF, the ability to set the control block size when performing file I/O. By increasing the control block size I can increase the memory allocated to the file's I/O buffer, which will increase the speed of I/O to the file. The enhancement is to the second element of the "Limit to sequential only" parameter, SEQONLY.

Limit to sequential only:
  Sequential only  . . . . . . > ____        *NO, *YES
  Number of records  . . . . . > _________   Number, *BUF32KB, *BUF64KB...

If the first element, "Sequential only", is "*YES", indicating that the file should be processed sequentially, the "Number of records" element can be used. If the second element is left blank the rules for the default size of buffer size is, well, complicated. I have taken this from IBM's documentation:

The database file uses sequential only processing. A default value for the number of records transferred as a group is determined by the system based on how the file is used, the type of access path involved, and the file's record length:

  • The default is approximately the number of records that fit in an internal buffer of 4K for:
    • All database files opened for input only
    • Physical files opened for output that are only processed in either arrival sequence or in non-unique keyed sequence and that have no logical file members based on them
  • The default is 1 record for:
    • All logical files opened for output only
    • Physical files opened for output only that either have unique keyed sequence access paths or have at least one dependent logical file with a keyed sequence access path that does not share the access path of the keyed physical file member

We are now offered four new values for the second element, each one giving us a greater size than the default 4KB buffer to use:

  • *BUF32KB buffer size = 32KB
  • *BUF64KB buffer size = 64KB
  • *BUF128KB buffer size = 128KB
  • *BUF256KB buffer size = 256 KB

The two biggest limitations I have found are:

  • This can only be used for sequential read only and write only operations. No update or delete.
  • When using large control blocks I need to be careful not to exceed the storage pool's available buffer. Which means do not change everything to use the largest buffer size, use it sparingly.

I decided to perform some tests to see if using these different buffer sizes would make a difference when using a copy file of one million records that contain a variety of field types (alphanumeric, packed, signed, date, time, timestamp, binary). I would like to have tested with a larger number of records, but I only have limited available storage on the IBM i server I used for these tests.

I had two files:

  • BIGFILE
  • BIGCOPY – a copy of BIGFILE created using the CRTDUPOBJ command.

The test was simple, I would use the "Copy file" command, CPYF, to copy the records from BIGFILE to BIGCOPY. Before the CPYF I would clear BIGCOPY so that the CPYF was just copying data.

First I need to create a baseline to which I could compare the results of using the different control blocks. This would just be a "standard" CPYF we have all executed many times before:

  CPYF FROMFILE(MYLIB/BIGFILE) TOFILE(QTEMP/BIGCOPY) +
         MBROPT(*ADD)

I performed this CPYF ten times and calculated the average time each copy took.

Now I am ready to perform the tests for the different control block sizes. IBM recommends using the "Allocate object" command, ALCOBJ, to exclude other jobs from updating the From file, BIGCOPY. If I did not exclude other jobs from the file a record may be changed or deleted by another job. If the changed or deleted record is in the control block being processed the change will not be copied to the To file, BIGCOPY.

I am not going to include my entire test program, just these important parts:

01  OVRDBF FILE(BIGFILE) TOFILE(MYLIB/BIGFILE) +
             OVRSCOPE(*JOB) SEQONLY(*YES &BUFFER)

02  OVRDBF FILE(BIGCOPY) TOFILE(QTEMP/BIGCOPY) +
             OVRSCOPE(*JOB) SEQONLY(*YES &BUFFER)

03  ALCOBJ OBJ((BIGFILE *FILE *EXCL))

04  CPYF FROMFILE(BIGFILE) TOFILE(BIGCOPY) MBROPT(*ADD)

05  DLCOBJ OBJ((BIGFILE *FILE *EXCL))

06  DLTOVR FILE(BIGFILE BIGCOPY) LVL(*JOB)

The variable &BUFFER would be changed to contain the control block value I am testing.

Line 1 is the override of the From file, BIGFILE. For the test I made the "Override scope", OVRSCOPE, the job, and the "Limit to sequential only", SEQONLY, is set to "*YES" and the control block size to the one I desire.

The override is repeated for the To file, BIGCOPY.

As suggested by IBM I have allocated the file to exclude all other jobs from using it. In reality no other program uses this file so this line is redundant, but I wanted to include it to complete this example.

On line 4 the Copy file is performed, as I mentioned before I have already cleared the To file, BIGCOPY.

When the copy is complete I need to deallocate the To file, line 5.

And finally, on line 6, delete the overrides I had placed upon the two files.

I repeated this ten times for each control block size and determined the time taken to perform each copy.

So what were the results?

Control
buffer
Seconds
None 25.65688
*BUF32KB 24.22434
*BUF64KB 23.79016
*BUF128KB 23.46741
*BUF256KB 23.32999

From these results you can see that using control blocks for this copy file does speed up the process. But with such a small file with only one millions records in it is not that much faster. I am sure we all have files on our systems that contain many tens of millions of records, for copying those files using control blocks will see a faster copying.

in my next post I will discuss how to use control blocks in SQL.

 

I have found this enhancement on an IBM i 7.1 server with TR7, but this is not listed in IBM's documentation. Therefore, the link is for the OVRDBF command is from the IBM i 7.2 site here.

 

This article was written for IBM i 7.1 TR7, and later refreshes and releases.

2 comments:

  1. I use this in sequence only mode. HLL reads a large block from the file but processes each record individually. Used it mostly for a driver application which would build a subset work file. Additonal sorting might be done on work file.

    ReplyDelete
  2. Stephen SpratleyJune 10, 2015 at 2:50 PM

    Simon,
    I have done extensive testing in an attempt to determine the effect of larger block sizes on overall throughput. In theory, getting as much data as possible for each I/O request should result in substantially lower CPU usage and therefor also run time. However, there are many factors that affect these times.
    What is the effect of OS caching on the throughput of your performance?
    If the operating system had already cached the entire file prior to your first timed run, as could happen on a relatively-inactive system, you could see very little performance improvement by changing the buffer size? Cached data will remain available to the OS until the cache is needed by another process, resulting in little or no actual I/O to the file.
    Also, the results appear to be elapsed time, rather than CPU time. Elapsed time will be affected by CPU time-sharing and by other activity on the system.
    On a busy system, the OS could swap out your program during a long I/O transfer, incurring a double overhead when it needs to be swapped in again.
    Is there an IBM OS/400 Database developer out there who could enlighten us further?

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.