Tuesday, August 11, 2020

Removing deleted records faster than RGZPFM

cpyf quicker than rgzpfm

The subject of this post is not new, but I thought I would share this as this is the quickest way I know to get rid of millions of deleted records taking up space in your files. The last time I used this method was with a file that contained 1 million "active" and 11 million deleted records. The application owner of this file had a fixed amount of time to remove the deleted records in their weekly maintenance "window". Having performed tests using RGZPFM she found that it took longer than the allowed, and came to me for ideas.

RGZPFM FILE(BIGFILE)

The part of this process that many people forget is all the access paths are reorganized too. In this case there were a plethora of logical files built over this file, I forget exactly how many but too many for my liking.

What was my suggested alternative?

It was the Copy File command, CPYF. When you use CPYF only the "active" records are copied. All I would need to do was to copy the file to a new copy of itself, and then copy it back again. Yes, it is as simple as it sounds, but there are some other things I can do to make the CPYF even faster.

Before I show how I would do this I need to explain the objects I will be using:

  • BIGFILE:  The file I want to remove the deleted records from
  • BIGFILEL0 – 3:  Logical files built over BIGFILE
  • BIGCPYF:  A work file I will be using, copy of BIGFILE

And here is the program:

01 PGM

02  OVRDBF FILE(BIGFILE) OVRSCOPE(*CALLLVL) +
              SEQONLY(*YES *BUF256KB)

03  OVRDBF  FILE(BIGCPYF) OVRSCOPE(*CALLLVL) +
              SEQONLY(*YES *BUF256KB)


04  CPYF FROMFILE(BIGFILE) +
           TOFILE(MYLIB/BIGCPYF) +
           MBROPT(*ADD) +
           CRTFILE(*YES) +
           FROMRCD(1)


05  RMVM FILE(BIGFILEL0) MBR(*ALL)
06  RMVM FILE(BIGFILEL1) MBR(*ALL)
07  RMVM FILE(BIGFILEL2) MBR(*ALL)
08  RMVM FILE(BIGFILEL3) MBR(*ALL)


09  CLRPFM FILE(BIGFILE)


10  CPYF FROMFILE(BIGCPYF) +
         TOFILE(BIGFILE) +
         MBROPT(*ADD) +
         FROMRCD(1)


11  ADDLFM FILE(BIGFILEL0) MBR(BIGFILE0)
12  ADDLFM FILE(BIGFILEL1) MBR(BIGFILE1)
13  ADDLFM FILE(BIGFILEL2) MBR(BIGFILE2)
14  ADDLFM FILE(BIGFILEL3) MBR(BIGFILE3)

15  ENDPGM

Lines 2 and 3: I am using the Override Database File command, OVRDBF, to increase the control block size used by these files. This increases the amount of memory allocated to the files' I/O buffer, the more allocated memory the faster CPYF will run.

Line 4: I am using the Copy File command, CPYF, to copy the data from BIGFILE and create the file BIGCPYF. Notice that the FROMRCD parameter is 1, rather than *START, as I have shown in a previous post this makes the CPYF faster too.

If I was just to copy the records from BIGCPYF back into BIGFILE every time a record was added to the physical file all the logical files would then be updated, before the next record would be added to the physical file.

Lines 5 – 8: By removing the members from the related logical files removes the need for them to be updated when the physical files is updated.

Line 9: Deleting the contents from BIGFILE, all of the "active" and deleted records.

Line 10: Copy the data, which is only the "active" records, from BIGCPYF into BIGFILE.

Lines 11 – 14: Now I need to add the members back to the logical files. As I only have four logical files I kept these statements in this program. If I had many logical files I would create several programs, each one adding the member to a few of the logical files. At this point in the program I would submit those programs to different job queues in different subsystems, so that more than one member would be being added at the same time.

11  SBMJOB CMD(CALL PGM(PGM1)) JOB(ADDLFM_1) JOBQ(QINTER)
12  SBMJOB CMD(CALL PGM(PGM2)) JOB(ADDLFM_2) JOBQ(QSPL)  
13  SBMJOB CMD(CALL PGM(PGM3)) JOB(ADDLFM_3) JOBQ(QBATCH)

Testing this approach showed that this would finish within the allowed time "window".

 

This article was written for IBM i 7.4, and should work for some earlier releases too.

23 comments:

  1. nice share.
    usually, I use mimix reorg for removing deleted record.

    ReplyDelete
    Replies
    1. So what internal API does MIMIX use?
      Doubt it would be any faster than this native command set,

      Delete
    2. The elapsed time for MIMIX is similar, most likely, but has one very important difference. Your application can continue to use and update/insert to the original file continuously during the operation, and when the copy is done (and is in a 'keep-up' mode) then there is a very short application down time to cut over to the new file which has the deleted records removed. The same tech is used to move to a new file format (in which case the file and the application are 'promoted' at the cutover point.)

      Delete
  2. Have you tried to use instead of cpyf using SQL with insert into select * from?
    Seems faster than using cpyf

    ReplyDelete
    Replies
    1. In testing we did try using a SQL insert rather than a CPYF. Looking at the logs we found there was no discernible difference.

      Delete
  3. Any journaling involved? Others may need to consider that; especially if a journal called QSQJRN exists in the same library as the file since the CPYF CRTFILE(*YES) would start journaling, I believe, on "BIGCPYF" before copying the million rows; each of which would then create an entry in the journal receiver.

    ReplyDelete
    Replies
    1. If journals, and triggers, were on BIGFILE they would have been removed before the second CPYF, and added after all the members had been added to the logical files.

      I have only seen QSQJRN in libraries that were created using CREATE SCHEMA. For libraries created by CRTLIB it is not present, and the files are not automatically journaled to that journal.

      Delete
    2. You can manually create a QSQJRN journal in any library, or use STRJRNLIB to cause all created objects within the library to be journaled to a journal of your choosing.

      Delete
  4. Simon! This is exactly the right way to reorganize huge files fast, a way I've used many, many times...and had to explain to younger managers why it's the better way. It seems to also work a little faster the more deleted records are in the file.

    ReplyDelete
  5. Hi Simon, What percentage was the performance gain when you compare RGZPFM and mentioned method by you?

    ReplyDelete
    Replies
    1. This was performed 2 - 3 years ago. If I remember right the RGZPFM was cancelled after about 50 mins, as that meant it would not fit in the "window". The method described above was 20-30 mins.

      Delete
    2. But there are many factors that would make this unique and different from your situation.
      For example: Number of logical files, processor speed, number of other jobs running, etc.

      Delete
  6. José Luis Martín SantosAugust 12, 2020 at 2:41 PM

    Hello .... this method is not a novelty ... I have ever done it in Madrid on occasion or another ... if the RGZPFM command is used it would be relatively slow if the file has thousands or rather millions of records, in plan a file that is a history ... but the RGZPFM command really takes a long time to do its job is when the file on which its function is performed has a multitude of logical files, and also depending quite a lot on the amount of keys that each logical one has and how those keys are created ... I mean if they have Omitts, for example ...

    ReplyDelete
  7. What about if you do the rmv logical files first and then rgzpfm
    how long will that take compared to Your solution?

    i.e.

    RMVM FILE(BIGFILEL0) MBR(*ALL)
    RMVM FILE(BIGFILEL1) MBR(*ALL)
    RMVM FILE(BIGFILEL2) MBR(*ALL)
    RMVM FILE(BIGFILEL3) MBR(*ALL)

    RGZPFM FILE(BIGFILE)

    ADDLFM FILE(BIGFILEL0) MBR(BIGFILE0)
    ADDLFM FILE(BIGFILEL1) MBR(BIGFILE1)
    ADDLFM FILE(BIGFILEL2) MBR(BIGFILE2)
    ADDLFM FILE(BIGFILEL3) MBR(BIGFILE3)

    ReplyDelete
    Replies
    1. That method was tested. It is faster than just doing the RGZPFM without removing the LF's members.

      It was still slower than the way I described.

      Delete
  8. An alternative to removing the logical file member to temporarily get rid of the access paths is to set them to rebuild maintenance: CHGLF FILE(BIGFILEL0) MAINT(*REBLD). Then when the records are copied back into the original file, set it back to immediate maintenance: CHGLF FILE(BIGFILEL0) MAINT(*IMMED). Two advantages to using this method are that programs won't crash if they are accidentally run since the logical file members aren't missing, and when the access path maintenance is returned to *IMMED, the actual rebuild is offloaded to some system jobs (QDBSRV01-09 I believe) so there is no need to submit multiple jobs to run the rebuilds in parallel since it will happen automatically.

    ReplyDelete
  9. Thanks for posting, id forgotten about cpyf ..

    ReplyDelete
  10. RGZPFM may be more secure in case the job fails before completion.

    ReplyDelete
  11. Thanks for sharing, the window to do this maintenance gets smaller every year and the files just get bigger.

    ReplyDelete
  12. You should consider to change the file parameters Reuse deleted records to *YES, so you won't have this issue in the future

    ReplyDelete
    Replies
    1. I ALWAYS create my files to reuse deleted records, but too many others never bother to do so.

      Delete
  13. If the file is journaled you can do a RGZPFM while it is active and in use. It may take a few passes and you can monitor the progress in Navigator. We started doing this recently since we never had any down time for RGZPFM.

    ReplyDelete
  14. This copying method has a caveat, and that is there must be enough disk space in the system to carry it out. Bear in mind the primary reason reorg for performing reorg is because there is a necessity to reclaim disk space upon encountering inadequacy.

    Also, I wonder if it is better to just delete Bigfile (step 9) and rename Bigcpyf to Bigfile at step 10 before rebuilding logical files.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.