Removing deleted records faster than RGZPFM @ RPGPGM.COM

Tuesday, August 11, 2020

Removing deleted records faster than RGZPFM

The subject of this post is not new, but I thought I would share this as this is the quickest way I know to get rid of millions of deleted records taking up space in your files. The last time I used this method was with a file that contained 1 million "active" and 11 million deleted records. The application owner of this file had a fixed amount of time to remove the deleted records in their weekly maintenance "window". Having performed tests using RGZPFM she found that it took longer than the allowed, and came to me for ideas.

RGZPFM FILE(BIGFILE)

The part of this process that many people forget is all the access paths are reorganized too. In this case there were a plethora of logical files built over this file, I forget exactly how many but too many for my liking.

What was my suggested alternative?

It was the Copy File command, CPYF. When you use CPYF only the "active" records are copied. All I would need to do was to copy the file to a new copy of itself, and then copy it back again. Yes, it is as simple as it sounds, but there are some other things I can do to make the CPYF even faster.

Before I show how I would do this I need to explain the objects I will be using:

BIGFILE: The file I want to remove the deleted records from
BIGFILEL0 – 3: Logical files built over BIGFILE
BIGCPYF: A work file I will be using, copy of BIGFILE

And here is the program:

01 PGM

02  OVRDBF FILE(BIGFILE) OVRSCOPE(*CALLLVL) +
              SEQONLY(*YES *BUF256KB)

03  OVRDBF  FILE(BIGCPYF) OVRSCOPE(*CALLLVL) +
              SEQONLY(*YES *BUF256KB)


04  CPYF FROMFILE(BIGFILE) +
           TOFILE(MYLIB/BIGCPYF) +
           MBROPT(*ADD) +
           CRTFILE(*YES) +
           FROMRCD(1)


05  RMVM FILE(BIGFILEL0) MBR(*ALL)
06  RMVM FILE(BIGFILEL1) MBR(*ALL)
07  RMVM FILE(BIGFILEL2) MBR(*ALL)
08  RMVM FILE(BIGFILEL3) MBR(*ALL)


09  CLRPFM FILE(BIGFILE)


10  CPYF FROMFILE(BIGCPYF) +
         TOFILE(BIGFILE) +
         MBROPT(*ADD) +
         FROMRCD(1)


11  ADDLFM FILE(BIGFILEL0) MBR(BIGFILE0)
12  ADDLFM FILE(BIGFILEL1) MBR(BIGFILE1)
13  ADDLFM FILE(BIGFILEL2) MBR(BIGFILE2)
14  ADDLFM FILE(BIGFILEL3) MBR(BIGFILE3)

15  ENDPGM

Lines 2 and 3: I am using the Override Database File command, OVRDBF, to increase the control block size used by these files. This increases the amount of memory allocated to the files' I/O buffer, the more allocated memory the faster CPYF will run.

Line 4: I am using the Copy File command, CPYF, to copy the data from BIGFILE and create the file BIGCPYF. Notice that the FROMRCD parameter is 1, rather than *START, as I have shown in a previous post this makes the CPYF faster too.

If I was just to copy the records from BIGCPYF back into BIGFILE every time a record was added to the physical file all the logical files would then be updated, before the next record would be added to the physical file.

Lines 5 – 8: By removing the members from the related logical files removes the need for them to be updated when the physical files is updated.

Line 9: Deleting the contents from BIGFILE, all of the "active" and deleted records.

Line 10: Copy the data, which is only the "active" records, from BIGCPYF into BIGFILE.

Lines 11 – 14: Now I need to add the members back to the logical files. As I only have four logical files I kept these statements in this program. If I had many logical files I would create several programs, each one adding the member to a few of the logical files. At this point in the program I would submit those programs to different job queues in different subsystems, so that more than one member would be being added at the same time.

11  SBMJOB CMD(CALL PGM(PGM1)) JOB(ADDLFM_1) JOBQ(QINTER)
12  SBMJOB CMD(CALL PGM(PGM2)) JOB(ADDLFM_2) JOBQ(QSPL)  
13  SBMJOB CMD(CALL PGM(PGM3)) JOB(ADDLFM_3) JOBQ(QBATCH)

Testing this approach showed that this would finish within the allowed time "window".

This article was written for IBM i 7.4, and should work for some earlier releases too.

27 comments:

BanibouyaAugust 11, 2020 at 5:59 AM
nice share.
usually, I use mimix reorg for removing deleted record.
ReplyDelete
Replies
NunoafAugust 11, 2020 at 6:30 AM
Have you tried to use instead of cpyf using SQL with insert into select * from?
Seems faster than using cpyf
ReplyDelete
Replies
Keith PrykeAugust 11, 2020 at 6:54 AM
Any journaling involved? Others may need to consider that; especially if a journal called QSQJRN exists in the same library as the file since the CPYF CRTFILE(*YES) would start journaling, I believe, on "BIGCPYF" before copying the million rows; each of which would then create an entry in the journal receiver.
ReplyDelete
Replies
AnonymousAugust 11, 2020 at 11:45 PM
Simon! This is exactly the right way to reorganize huge files fast, a way I've used many, many times...and had to explain to younger managers why it's the better way. It seems to also work a little faster the more deleted records are in the file.
ReplyDelete
Replies
maranAugust 12, 2020 at 7:39 AM
Hi Simon, What percentage was the performance gain when you compare RGZPFM and mentioned method by you?
ReplyDelete
Replies
José Luis Martín SantosAugust 12, 2020 at 2:41 PM
Hello .... this method is not a novelty ... I have ever done it in Madrid on occasion or another ... if the RGZPFM command is used it would be relatively slow if the file has thousands or rather millions of records, in plan a file that is a history ... but the RGZPFM command really takes a long time to do its job is when the file on which its function is performed has a multitude of logical files, and also depending quite a lot on the amount of keys that each logical one has and how those keys are created ... I mean if they have Omitts, for example ...
ReplyDelete
Replies
AnonymousAugust 13, 2020 at 8:20 AM
What about if you do the rmv logical files first and then rgzpfm
how long will that take compared to Your solution?

i.e.

RMVM FILE(BIGFILEL0) MBR(*ALL)
RMVM FILE(BIGFILEL1) MBR(*ALL)
RMVM FILE(BIGFILEL2) MBR(*ALL)
RMVM FILE(BIGFILEL3) MBR(*ALL)

RGZPFM FILE(BIGFILE)

ADDLFM FILE(BIGFILEL0) MBR(BIGFILE0)
ADDLFM FILE(BIGFILEL1) MBR(BIGFILE1)
ADDLFM FILE(BIGFILEL2) MBR(BIGFILE2)
ADDLFM FILE(BIGFILEL3) MBR(BIGFILE3)
ReplyDelete
Replies
Brian RuschAugust 13, 2020 at 11:50 AM
An alternative to removing the logical file member to temporarily get rid of the access paths is to set them to rebuild maintenance: CHGLF FILE(BIGFILEL0) MAINT(*REBLD). Then when the records are copied back into the original file, set it back to immediate maintenance: CHGLF FILE(BIGFILEL0) MAINT(*IMMED). Two advantages to using this method are that programs won't crash if they are accidentally run since the logical file members aren't missing, and when the access path maintenance is returned to *IMMED, the actual rebuild is offloaded to some system jobs (QDBSRV01-09 I believe) so there is no need to submit multiple jobs to run the rebuilds in parallel since it will happen automatically.
ReplyDelete
Replies
Elwin KunzlerAugust 15, 2020 at 9:29 PM
Thanks for posting, id forgotten about cpyf ..
ReplyDelete
Replies
AVROHOMNAugust 16, 2020 at 7:00 AM
RGZPFM may be more secure in case the job fails before completion.
ReplyDelete
Replies
Leon KempAugust 17, 2020 at 6:21 AM
Thanks for sharing, the window to do this maintenance gets smaller every year and the files just get bigger.
ReplyDelete
Replies
Luis ChavezSeptember 1, 2020 at 12:10 PM
You should consider to change the file parameters Reuse deleted records to *YES, so you won't have this issue in the future
ReplyDelete
Replies
UnknownSeptember 15, 2020 at 3:46 PM
If the file is journaled you can do a RGZPFM while it is active and in use. It may take a few passes and you can monitor the progress in Navigator. We started doing this recently since we never had any down time for RGZPFM.
ReplyDelete
Replies
MCSeptember 30, 2020 at 11:59 PM
This copying method has a caveat, and that is there must be enough disk space in the system to carry it out. Bear in mind the primary reason reorg for performing reorg is because there is a necessity to reclaim disk space upon encountering inadequacy.

Also, I wonder if it is better to just delete Bigfile (step 9) and rename Bigcpyf to Bigfile at step 10 before rebuilding logical files.
ReplyDelete
Replies
KlipsterDecember 18, 2024 at 2:41 PM
I will be putting this method to the test during the last two weeks of December 2024...

I have a file that contains more than half a Billion records... That's 8 years-worth of inventory movements and I am going to attempt to purge 4 years of those... Roughly 250 Million records...!

Thanks for sharing your talent and expertise Simon...
ReplyDelete
Replies
KlipsterDecember 28, 2024 at 12:14 AM
For the record: The entire process completed in just under an hour...
Amazing...! I'm very surprised and pleased...
Power9 server @ V7R4M0 with 40% used of 4TB DASD...
There was nobody else on the system and journaling was turned off for this file. QEDD Replication was also Paused...

FYI - I used a combo of RMVM and CHGLF to stop the LF Indexes from rebuilding while we re purging...

CHGLF MAINT(*REBLD) was the faster option as far as I could tell however it would not work for any index that had a UNIQUE key or a Shared Access path...

So, for two of my 21 LFs I did use the RMVM/ADDLFM commands to my program...

I was apprehensive about this whole process, and now that it's done I'm very confident about tackling the next massive file that needs to be purged...!

Many thanks Simon...!
Cheers...!
ReplyDelete
Replies

Add comment

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.

RPGPGM.COM - From AS400 to IBM i

Pages

Advertisements

Tuesday, August 11, 2020

Removing deleted records faster than RGZPFM

27 comments: