Discovering the number of deleted records in a file @ RPGPGM.COM

Wednesday, July 1, 2015

Discovering the number of deleted records in a file

In last week's post, here, I gave an example of how I determine the 250 biggest objects on my IBM i server. I created the information about the objects by using the Display Object Description command, DSPOBJD, and mentioned if anyone knew of a better way to get the same information to contact me.

Bill Gravelle did just that suggesting that I look for a table in the QSYS2 library, see here. After poking around I did find a promising table, SYSTABLESTAT, which contains information for every user table/file. This would not have identified all of the biggest objects, as 28 of them were Journal Receivers, *JRNRCV. But the table does contain information that is useful, I am not going to list all the columns/fields in this table as IBM does such a good job on their Knowledge Center web site here.

As we have been talking about releasing wasted disk space the column that grabbed my attention was NUMBER_DELETED_ROWS. Most of the files on the IBM i I use at work do not reuse delete records, therefore, of them have increasing numbers of deleted records that are just wasting disk space. If I could identify the worst offenders I could purge these deleted records and release that disk space. With this in mind I am only interested in the following columns:

Column name	Description
DATA_SIZE	Size of table/file in bytes
SYSTEM_TABLE_NAME	Table/file name
SYSTEM_TABLE_SCHEMA	Library that contains table/file
NUMBER_ROWS	Number of valid rows/records
NUMNER_DELETED_ROWS	Number of deleted rows/records in the file

I can put that all in a SQL in a CL program like this:

01  PGM

02  RUNSQL SQL('CREATE TABLE MYLIB.OUTFILE AS +
               (SELECT DATA_SIZE,SYSTEM_TABLE_NAME, +
                       SYSTEM_TABLE_SCHEMA,NUMBER_ROWS, +
                       NUMBER_DELETED_ROWS +
                FROM QSYS2.SYSTABLESTAT +
                ORDER BY NUMBER_DELETED_ROWS DESC +
                FETCH FIRST 250 ROWS ONLY) +
                WITH DATA') +
           COMMIT(*NONE) NAMING(*SQL)

03  ENDPGM

If you are not familiar with CREATE TABLE in SQL you ought to read the post Creating SQL table on the fly.

I have decided to use the SQL naming convention, NAMING(*SQL), just because I can and to show you that it is not much different from the standard IBM i convention. In the SELECT statement I have selected the columns mentioned in the table above, and sorted (order by) by the number of deleted rows in descending order, i.e. largest first. Then the first 250 rows will be written to by output file, MYLIB.OUTFILE. This will include all the tables/files in IBM libraries too, e.g. QSYS.

When run this took about five times as long as program in the previous post that used DSPOBJD.

The output looks like this:


 DATA_SIZE  SYSTEM_TABLE_NAME  SYSTEM_TABLE_SCHEMA  NUMBER_ROWS  NUMBER_DELETED_ROWS
2116046848  FILE1              LIB1                          75            5,351,845
1981816832  FILE2              LIB1                      11,781            4,988,364
1361072128  FILE3              LIB1                         890            3,504,467
 421531648  FILE4              LIB1                      19,130            3,438,994
 304144384  FILE5              LIB2                     855,966            3,423,437

Only the names have been changed to protect the innocent files and libraries.

There is a mutlitude of other information in this table that may prove useful in the future, such as:

LOGICAL_READS - Number of logical read operations since the last IPL.
PHYSICAL_READS - Number of physical read operations since the last IPL.
SEQUENTIAL_READS - Number of sequential read operations since the last IPL.
RANDOM_READS - Number of random read operations since the last IPL.

You can learn more about the SYSCL's SYSTABLESTAT table on the IBM website here.

This article was written for IBM i 7.2, and should work for earlier releases too.

16 comments:

mmayer625July 1, 2015 at 8:18 AM
Couple of items to note:
We had to remove the "SQL" stmt from the RUNSQL command.
We had to update the "xyzlib.file" to "xyzlib/file".
We had to remove / comment out the "commit" portion of the RUNSQL command.
We could not locate file QSYS2.SYSTABLESTAT even though once we were ablr to compile the CLLE, the program is running and selecting records. Waiting on results but wanted to post ahead of the results.
We're at V7R1. Wonder if some of this is a PTF issue since we'r ein a 24x7x365 hospital environment and getting any PTF window in a multi-partition setting is difficult at best.
ReplyDelete
Replies
Frank FerlisiJuly 1, 2015 at 2:00 PM
This is a problem that can be controlled systematically. I wrote a CL program that did a DSPFD to an output file then I read the output file and checked the variable of number of deleted records if it came out to be a certain percentage (could be a variable) of the total size of the file the CL would automatically reorganize the file eliminating the deleted records. You may want to do some analysis on this method beforehand. Files with millions of records will take a very long time to reorg especially if there are many logical files attached. You also may want to change the the variable to reuse deleted records.
ReplyDelete
Replies
Simon HutchinsonJuly 1, 2015 at 2:03 PM
I will be taking this approach.

With files containing millions of deleted records I have found that it is quicker to CPYF the contents of the file to another. Clear the original. Then CPYF the data back into the original.

And I will be changing the file to reuse deleted records too.
ReplyDelete
Replies
Glenn GundermannJuly 2, 2015 at 10:08 AM
We're on 7.1 and this worked like a charm, sort of. I'm surprised that it takes so long to run. It must have taken almost an hour. Why is that?
ReplyDelete
Replies
Glenn GundermannJuly 2, 2015 at 10:50 AM
By the way, NUMBER_ROWS and NUMBER_DELETED_ROWS is a total for the file. Using this output as a driver for a RGZPFM will not work for a multi-member file if the deleted rows is not in the first member. Since we have a package that uses multi-members, it would be useful to see the above with member-level statistics.
ReplyDelete
Replies
ÍñigoNovember 17, 2015 at 5:29 AM
Hi Simon, great article, great blog!!

Do you know if there is a catalog Table/View where I can browse different data that can be got from DSPFD command?

I'd like to combine this SQL sentence, with another table/view, to check wether big files with lots of deleted records have REUSEDLT set or not.

I'm trying to find something related with that, and I guess it exists!

Thanks a lot!
ReplyDelete
Replies

Add comment

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.

RPGPGM.COM - From AS400 to IBM i

Pages

Wednesday, July 1, 2015

Discovering the number of deleted records in a file

16 comments: