Wednesday, August 19, 2015

Speeding up CPYF

cpyf fromrcd *start and control blocking *buf256

Every developer type person who has worked on the IBM i, or its predecessors, has had to use the Copy File command, CPYF, to copy data from one file to another. Most have rarely used more that the first screen's parameters, thereby missing a way to make the command execute faster. For small files the difference is negligible, but in a file of several 100,000 records or more the method I describe below will make a noticeable difference.

I have to thank Stu Haddock for reminding me of this. It is all down to the value of the FROMRCD parameter of the CPYF command.

The default for the FROMRCD parameter is *START, which means that you want to start copying the data from the start of the file. You can also enter from which record you wish to copy. If I enter "1", to indicate that I want to start copying from the first records. I would expect that to be the same as the FROMRCD(*START). But it is not. If I am copying a keyed file:

  • FROMRCD(*START) - The records are copied in keyed order.
  • FROMRCD(1) - The records are copied in arrival sequence, in other words the order the records are in the file, and not in keyed order.

But does this really make a difference?

I created a file called BIGFILE that contained eight fields. The first field, FLD1, is a packed field and is the file's only key field. I was not going to do much of a test if the records were number sequentially, so I wrote a program to write 1 million records to the file and FLD1 contains a random number generated by the CEERAN0 API. If you are not familiar with this API or with generating random numbers you should read the post Generating random numbers.

I created three programs:

The first used CPYF with FROMRCD(*START):

  CPYF FROMFILE(MYLIB/BIGFILE) +
         TOFILE(QTEMP/@BIGFILE) MBROPT(*ADD)

The second used CPYF with FROMRCD(1):

  CPYF FROMFILE(MYLIB/BIGFILE) +
         TOFILE(QTEMP/@BIGFILE) MBROPT(*ADD) +
         FROMRCD(1)

And the third program used CPYF with FROMRCD(1) and control blocking. To learn about how to use control blocking see Using control blocking to improve database performance.

  OVRDBF FILE(BIGFILE) TOFILE(MYLIB/BIGFILE) +
           OVRSCOPE(*JOB) SEQONLY(*YES *BUF256KB)

  OVRDBF FILE(@BIGFILE) TOFILE(QTEMP/@BIGFILE) +
           OVRSCOPE(*JOB) SEQONLY(*YES *BUF256KB)

  CPYF FROMFILE(MYLIB/BIGFILE) +
         TOFILE(QTEMP/@BIGFILE) MBROPT(*ADD) +
         FROMRCD(1)

I ran this on an IBM i where I was the only user signed on. Each program was submitted ten times to a "single threaded" job queue in the order of program 1, 2, 3, 1, 2, 3, etc. The time taken for each CPYF was calculated in seconds, and the average for each type of copy file.

Type of copy Average
seconds
FROMRCD(*START) 36.0
FROMRCD(1) 15.2
FROMRCD(1) with control blocking 13.6

The FROMRCD(*1) took less than half the time it took the FROMRCD(*START) to complete. Using control blocking made the CPYF even faster completing in 38% of the time it takes using the FROMRCD(*START) to finish.

Next time you have to copy a large file you should use control blocking and FROMRCD(1) rather than just the CPYF command's defaults.

Do note that the FROMRCD parameter can only be used with Physical file and SQL tables, not with Logical files.

What happens if you want to use selection criteria? See here.

 

You can learn more about the CPYF command on the IBM web site here.

 

This article was written for IBM i 7.2, and should work for earlier releases too.

17 comments:

  1. that's actually pretty useful. i do a lot of crazy things with CPYF but have completely ignored that parm before.

    ReplyDelete
  2. Hi,
    I have written several file copy programs using SQL. The blocking factor is more important and very significant time savings. When the target files include index, it is very interesting to hold rebuild while the copywith CHGLF command.

    ReplyDelete
    Replies
    1. For those not familiar with control blocking in SQL I wrote about it in the post Overriding control block size using SQL.

      Delete
  3. Sir, This info is really helpful as i frequently use this command.

    Thanks a lot keep posting new things which are hard to find in internet.

    ReplyDelete
  4. That's really useful..Thanks a lot...

    ReplyDelete
  5. but is CPYF better than using sql?

    ReplyDelete
  6. Perhaps in a world of constraints and triggers, we should follow the advice of IBMs experts and "Throw CPYF and CRTDUPOBJ away" in favor of SQL inserts.
    --Keith Hodges

    ReplyDelete
  7. Great post Simon. We have extremely large files in our organisation which we copy sometimes and the FROMRCD and blocking will play a big part in my command from now on.

    ReplyDelete
  8. Kind of a related question: If I have a partial file on one system, can I copy using FROMRCD and TORCD with these options to put the records into the same relative record numbers on the "target"? In my case, I have a large number of deleted rows on the file copy, the target. If I copy from record number 1,000,000 on the original, the source, will those get copied into the corresponding relative record numbers on the file copy, the target?

    ReplyDelete
    Replies
    1. No they will not be copied into the same RRN to the new file.
      The only way I could think to do something like that would be to use an RPG program, see the post Using Relative Record Number with data files in RPG.

      Delete
    2. Use the COMPRESS(*NO) option on CPYF to keep the source/target RRNs the same, deleted rows in the source table will now be in the target.

      Delete
  9. And also could divide and conquer. Retrieve the number of rows in the table and divide it by the number of batch jobs to do CPYF. Say 200,000,000 rows, 20 jobs, that's 10,000,000 rows per job. Submit each batch job to QUSRNOMAX to do a CPYF with specific FROMRCD/TORCD row numbers (1 to 10,000,000, 10,000,001 to 20,000,000, etc). This can be done in CL and pass in number of jobs and do SBMJOBs in a loop. When the TO table has no more locks, it's done (or use API QUSRJOBI 'JOBI0400' to check each batch job).

    Ringer

    ReplyDelete
  10. Hi,

    Is the creation of a SQL index can speed a cpyf using INCREL option?

    Regards,

    Olivier Drubigny.

    ReplyDelete
    Replies
    1. It would if the key of the index matches the selection criteria you are using in the CPYF.

      Delete
  11. Thanks for sharing

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.