Wednesday, July 1, 2015

Discovering the number of deleted records in a file

systablestat in qsys2

In last week's post, here, I gave an example of how I determine the 250 biggest objects on my IBM i server. I created the information about the objects by using the Display Object Description command, DSPOBJD, and mentioned if anyone knew of a better way to get the same information to contact me.

Bill Gravelle did just that suggesting that I look for a table in the QSYS2 library, see here. After poking around I did find a promising table, SYSTABLESTAT, which contains information for every user table/file. This would not have identified all of the biggest objects, as 28 of them were Journal Receivers, *JRNRCV. But the table does contain information that is useful, I am not going to list all the columns/fields in this table as IBM does such a good job on their Knowledge Center web site here.

As we have been talking about releasing wasted disk space the column that grabbed my attention was NUMBER_DELETED_ROWS. Most of the files on the IBM i I use at work do not reuse delete records, therefore, of them have increasing numbers of deleted records that are just wasting disk space. If I could identify the worst offenders I could purge these deleted records and release that disk space. With this in mind I am only interested in the following columns:

Column name Description
DATA_SIZE Size of table/file in bytes
SYSTEM_TABLE_NAME Table/file name
SYSTEM_TABLE_SCHEMA Library that contains table/file
NUMBER_ROWS Number of valid rows/records
NUMNER_DELETED_ROWS Number of deleted rows/records in the file

I can put that all in a SQL in a CL program like this:

01  PGM

02  RUNSQL SQL('CREATE TABLE MYLIB.OUTFILE AS +
               (SELECT DATA_SIZE,SYSTEM_TABLE_NAME, +
                       SYSTEM_TABLE_SCHEMA,NUMBER_ROWS, +
                       NUMBER_DELETED_ROWS +
                FROM QSYS2.SYSTABLESTAT +
                ORDER BY NUMBER_DELETED_ROWS DESC +
                FETCH FIRST 250 ROWS ONLY) +
                WITH DATA') +
           COMMIT(*NONE) NAMING(*SQL)

03  ENDPGM

If you are not familiar with CREATE TABLE in SQL you ought to read the post Creating SQL table on the fly.

I have decided to use the SQL naming convention, NAMING(*SQL), just because I can and to show you that it is not much different from the standard IBM i convention. In the SELECT statement I have selected the columns mentioned in the table above, and sorted (order by) by the number of deleted rows in descending order, i.e. largest first. Then the first 250 rows will be written to by output file, MYLIB.OUTFILE. This will include all the tables/files in IBM libraries too, e.g. QSYS.

When run this took about five times as long as program in the previous post that used DSPOBJD.

The output looks like this:


 DATA_SIZE  SYSTEM_TABLE_NAME  SYSTEM_TABLE_SCHEMA  NUMBER_ROWS  NUMBER_DELETED_ROWS
2116046848  FILE1              LIB1                          75            5,351,845
1981816832  FILE2              LIB1                      11,781            4,988,364
1361072128  FILE3              LIB1                         890            3,504,467
 421531648  FILE4              LIB1                      19,130            3,438,994
 304144384  FILE5              LIB2                     855,966            3,423,437

Only the names have been changed to protect the innocent files and libraries.

There is a mutlitude of other information in this table that may prove useful in the future, such as:

  • LOGICAL_READS - Number of logical read operations since the last IPL.
  • PHYSICAL_READS - Number of physical read operations since the last IPL.
  • SEQUENTIAL_READS - Number of sequential read operations since the last IPL.
  • RANDOM_READS - Number of random read operations since the last IPL.

 

You can learn more about the SYSCL's SYSTABLESTAT table on the IBM website here.

 

This article was written for IBM i 7.2, and should work for earlier releases too.

16 comments:

  1. Couple of items to note:
    We had to remove the "SQL" stmt from the RUNSQL command.
    We had to update the "xyzlib.file" to "xyzlib/file".
    We had to remove / comment out the "commit" portion of the RUNSQL command.
    We could not locate file QSYS2.SYSTABLESTAT even though once we were ablr to compile the CLLE, the program is running and selecting records. Waiting on results but wanted to post ahead of the results.
    We're at V7R1. Wonder if some of this is a PTF issue since we'r ein a 24x7x365 hospital environment and getting any PTF window in a multi-partition setting is difficult at best.

    ReplyDelete
    Replies
    1. The file is not called SYSTABLESTAT. That is what the view is known as within SQL.

      I am surprised that you were not able to run this in the SQL convention. I tired this on an IBM i running 7.1 TR7 and it worked as I have described.

      Delete
    2. SYSTABLESTAT is the SQL name for the SQL View QSYS2/SYSTSTAT.

      If you look in QSYS2 its object attribute is "LF", which is how the IBM i stores Views.

      Delete
  2. This is a problem that can be controlled systematically. I wrote a CL program that did a DSPFD to an output file then I read the output file and checked the variable of number of deleted records if it came out to be a certain percentage (could be a variable) of the total size of the file the CL would automatically reorganize the file eliminating the deleted records. You may want to do some analysis on this method beforehand. Files with millions of records will take a very long time to reorg especially if there are many logical files attached. You also may want to change the the variable to reuse deleted records.

    ReplyDelete
    Replies
    1. I personally think this is the cleanest, easiest way to do that

      Delete
  3. I will be taking this approach.

    With files containing millions of deleted records I have found that it is quicker to CPYF the contents of the file to another. Clear the original. Then CPYF the data back into the original.

    And I will be changing the file to reuse deleted records too.

    ReplyDelete
    Replies
    1. Make sure and do the CPYF by RRN(*BEGIN)

      Delete
    2. Just make sure you do the CPYF by relative record #

      Delete
  4. We're on 7.1 and this worked like a charm, sort of. I'm surprised that it takes so long to run. It must have taken almost an hour. Why is that?

    ReplyDelete
    Replies
    1. My guess is...

      - There are a heck of alot of files/tables on any IBM i when this includes all the IBM (operating system) libraries.

      - As this is a View it is generated when it is used, unlike an Index which is more life a LF.

      Delete
    2. It was the same to me.
      But, when I submitted the job, it took just a few seconds.

      First time I tried with a tool using JDBC.

      Delete
  5. By the way, NUMBER_ROWS and NUMBER_DELETED_ROWS is a total for the file. Using this output as a driver for a RGZPFM will not work for a multi-member file if the deleted rows is not in the first member. Since we have a package that uses multi-members, it would be useful to see the above with member-level statistics.

    ReplyDelete
    Replies
    1. Alas, this does not give you member level statistics.

      But the NUMBER_PARTITIONS column does give the number of partitions (= members) that the file has.

      I would need to check and see if NUMBER_PARTITIONS is greater than 1 and do something else to determine the members in the file and the active and deleted records within them.

      Perhaps DSPFD FILE(lib/file) TYPE(*MBR) OUTPUT(*OUTFILE)...
      Then RTVMBRD using the records from the outfile

      Delete
  6. Hi Simon, great article, great blog!!

    Do you know if there is a catalog Table/View where I can browse different data that can be got from DSPFD command?

    I'd like to combine this SQL sentence, with another table/view, to check wether big files with lots of deleted records have REUSEDLT set or not.

    I'm trying to find something related with that, and I guess it exists!

    Thanks a lot!

    ReplyDelete
    Replies
    1. Yes, yes, I did it. But there is no info about that.

      I'm trying surfing on the different procedures/Views/tables provided by the system (I'm on 7.2 TR2).

      If I find something, I'll post it here.

      Delete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.