Wednesday, November 20, 2019

Using SQL to list directories and files in IFS

use sql table function to list ifs files and directories

Thanks to the folks at RZKH who downloaded and applied the PTFs for IBM i 7.4 TR1 on the first day they became available, what great service!, I have been able to explore some of the new features added to the operating system we know and love. Amongst the additions to Db2 for i, or SQL, is a number of table functions to list information about the directories and files in the Integrated File System, IFS. My original thought was to write one post about them all, but the more I played with the IFS_OBJECT_STATISTICS table function I came to the conclusion that it deserved its own post.

The IFS_OBJECT_STATISTICS table function returns similar information to the RTVDSKINF command. All I want is just some basic information about the directories and files I choose, but there are many more columns returned by the table function. I encourage you to check the link at the bottom of this post which will take you to the appropriate page in IBM's documentation.

In my folder, MyFolder, I have created the following files and subdirectories:

  • Directory: MyFolder
    • File: list.txt
    • File: report.csv
    • File: test_file.txt
    • File: xmlfile.xml
    • Directory: subfolder1
      • File: 1.txt
      • Directory: subfolder1a
        • File: 1a.txt
    • Directory: subfolder2
      • File: 2.txt

There are four parameters that can be passed to the IFS_OBJECT_STATISTICS table function to select, or omit, the information that is returned in the results:

Parameter Description
START_PATH_NAME The path name for starting the search. For example: /directory/subdirectory
SUBTREE_DIRECTORIES In the subdirectories of the path name given.
NO Only objects in the start path are listed.
YES Include all the directories found in the start path, and its subdirectories.
OBJECT_TYPE_LIST Object types to be included. Values must be separated by a space.
*ALLDIR Just the directories are returned.
*ALLSTMF All the steam file object types: *MBR, *DOC, *STMF, *DSTMF, *USRSPC. *MBR All database file members.
*NOQSYS Exclude all objects in QSYS.LIB.
OMIT_LIST Directories or subdirectories to exclude from the result. If there is more than one entry they must be separated by a space.

I have found that the parameter names are optional. I will show examples both with and without them. Regardless of whether I use the parameter names they must be separated by a comma ( , ).

Let me get a list of the files and subdirectories in my folder, MyFolder:

SELECT PATH_NAME,OBJECT_TYPE,CREATE_TIMESTAMP,LAST_USED_TIMESTAMP,
       DATA_SIZE
FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS('/MyFolder','YES'))

In this example I want to have the following columns of information returned to me:

  • PATH_NAME: Path name of the directory or file
  • OBJECT_TYPE: Type of object
  • CREATE_TIMESTAMP: The timestamp of when the file or directory was created
  • LAST_USED_TIMESTAMP: The timestamp of when the file was last used. Null is returned for directories
  • DATA_SIZE: Size in bytes of the object

Notice that I did not use the parameter names, just the values I want for the first two parameters.

The returned results look like:

                                         OBJECT
PATH_NAME                                _TYPE   CREATE_TIMESTAMP
---------------------------------------  ------  ------------------
/MyFolder                                *DIR    2019-11-02 20:15:08
/MyFolder/report.csv                     *STMF   2019-11-02 20:52:30
/MyFolder/list.txt                       *STMF   2019-11-02 21:05:58
/MyFolder/xmlfile.xml                    *STMF   2019-11-03 17:21:46
/MyFolder/test_file.txt                  *STMF   2019-11-16 21:28:41
/MyFolder/subfolder1                     *DIR    2019-11-17 16:35:05
/MyFolder/subfolder2                     *DIR    2019-11-17 17:35:37
/MyFolder/subfolder1/subfolder1a         *DIR    2019-11-17 16:35:53
/MyFolder/subfolder1/1.txt               *STMF   2019-11-17 16:36:24
/MyFolder/subfolder2/2.txt               *STMF   2019-11-17 17:38:56
/MyFolder/subfolder1/subfolder1a/1a.txt * STMF   2019-11-17 16:37:01


LAST_USED_TIMESTAMP  DATA_SIZE
-------------------  ---------
-                         8192
2019-11-17 21:08:23         54
2019-11-17 21:08:23         95
2019-11-17 21:08:23         42
2019-11-16 06:57:44          0
-                         8192
-                         8192
-                         8192
2019-11-17 21:08:23         19
2019-11-17 21:08:23         19
2019-11-17 21:08:23         19

What is the difference if I change the subtree parameter to "NO"?

SELECT PATH_NAME,OBJECT_TYPE FROM
TABLE(QSYS2.IFS_OBJECT_STATISTICS('/MyFolder','NO'))

The returned results are only for those objects in MyFolder, and not in its subdirectories:

PATH_NAME                OBJECT_TYPE
-----------------------  -----------
/MyFolder                *DIR
/MyFolder/report.csv     *STMF
/MyFolder/list.txt       *STMF
/MyFolder/xmlfile.xml    *STMF
/MyFolder/test_file.txt  *STMF
/MyFolder/subfolder1     *DIR
/MyFolder/subfolder2     *DIR

If I wanted to get a list of just the subdirectories in MyFolder I would use the following:

SELECT PATH_NAME,OBJECT_TYPE FROM
TABLE(QSYS2.IFS_OBJECT_STATISTICS(START_PATH_NAME => '/MyFolder',
                                 SUBTREE_DIRECTORIES => 'YES',
                                 OBJECT_TYPE_LIST => '*ALLDIR'))

This time I have the parameter names. The OBJECT_TYPE_LIST "asks" for only directories to be returned in the subtree of directories:

PATH_NAME                         OBJECT_TYPE
--------------------------------  -----------
/MyFolder                         *DIR
/MyFolder/subfolder1              *DIR
/MyFolder/subfolder2              *DIR
/MyFolder/subfolder1/subfolder1a  *DIR

In this example I want to list all of the files in the MyFolder and its subfolders, except those in subfolder1. By omitting subfolder1 I also omit any subfolders it has, subfolder1a.

SELECT PATH_NAME,OBJECT_TYPE FROM
TABLE(QSYS2.IFS_OBJECT_STATISTICS(START_PATH_NAME => '/MyFolder',
                            SUBTREE_DIRECTORIES =>'YES',
                            OMIT_LIST => '/MyFolder/subfolder1'))

The results are just from the root of MyFolder and the subfolder2 subdirectory:

PATH_NAME                   OBJECT_TYPE
--------------------------  -----------
/MyFolder                   *DIR
/MyFolder/report.csv        *STMF
/MyFolder/list.txt          *STMF
/MyFolder/xmlfile.xml       *STMF
/MyFolder/test_file.txt     *STMF
/MyFolder/subfolder2        *DIR
/MyFolder/subfolder2/2.txt  *STMF

One of the major reasons I want to look at objects in the IFS is to identify any extremely large ones that could be deleted. In this example I am looking for any files in MyFolder, and its subdirectories, that are greater than 50 bytes, well this is only test data and I do not have really big files in the IFS of this IBM i partition. The WHERE stipulates that the size of the object be more than 50 bytes, and the results are returned in descending order, i.e. the largest file first.

SELECT PATH_NAME,DATA_SIZE FROM
TABLE(QSYS2.IFS_OBJECT_STATISTICS('/MyFolder','YES','*ALLSTMF'))
WHERE DATA_SIZE > 50
ORDER BY DATA_SIZE DESC

I just have two files that fit this criteria:

PATH_NAME              DATA_SIZE
---------------------  ---------
/MyFolder/list.txt            95
/MyFolder/report.csv          54

As I hinted when I was explaining the parameters for this function it is also possible to use this to access the "IBM i directory environment". In this environment all of the libraries are subdirectories of QSYS.LIB. Here I want a list of all the files in my library, MYLIB, with their CCSID and sorted by size in descending order:

SELECT PATH_NAME,OBJECT_TYPE AS "Type",DATA_SIZE AS "Size",CCSID
FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS(START_PATH_NAME => 
                                           '/QSYS.LIB/MYLIB.LIB', 
OBJECT_TYPE_LIST => '*FILE'))
ORDER BY DATA_SIZE DESC

The START_PATH_NAME has to be in the IFS naming convention, therefore, my "directory" MYLIB.LIB is a subfolder of QSYS.LIB. The OBJECT_TYPE_LIST uses the usual IBM i object type for files. My results are as follows:

PATH_NAME                         Type  Size   CCSID
--------------------------------- ----- -----  -----
/QSYS.LIB/MYLIB.LIB/DEVSRC.FILE   *FILE 24576   273
/QSYS.LIB/MYLIB.LIB/EG055P0.FILE  *FILE 16384    37
/QSYS.LIB/MYLIB.LIB/EG055P1.FILE  *FILE 16384    37
/QSYS.LIB/MYLIB.LIB/FTPSRC.FILE   *FILE 16384    37
/QSYS.LIB/MYLIB.LIB/OLDSRC.FILE   *FILE 16384    37
/QSYS.LIB/MYLIB.LIB/OTHERSRC.FILE *FILE 16384    37
/QSYS.LIB/MYLIB.LIB/PERSON.FILE   *FILE 16384    37

As source files and data files are both types of files they are included in the results. Display and printer files would be included too if I had any in MYLIB.

The example below shows how I can select more than one object type, in this case programs and modules, and use a wild card WHERE to return any program or modules that start with "TEST".

SELECT PATH_NAME,OBJECT_TYPE FROM
TABLE(QSYS2.IFS_OBJECT_STATISTICS(START_PATH_NAME => 
                                     '/QSYS.LIB/MYLIB.LIB', 
OBJECT_TYPE_LIST => '*PGM *MODULE'))
WHERE PATH_NAME LIKE '/QSYS.LIB/MYLIB.LIB/TEST%'

I only have two objects, a program and a module that fit this criteria:

PATH_NAME                          OBJECT_TYPE
---------------------------------- -----------
/QSYS.LIB/MYLIB.LIB/TESTPGM.PGM    *PGM
/QSYS.LIB/MYLIB.LIB/TESTMOD.MODULE *MODULE

In this last example I want to show how the path can work with files in the "IBM i environment". As I have shown in a previous example I use the source file DEVSRC in the library MYLIB. I can use IFS_OBJECT_STATISTICS to list all the members in that source file with this statement:

SELECT 
SUBSTR(PATH_NAME,
       LOCATE_IN_STRING(PATH_NAME,'/',1,4)+1,
       (LOCATE_IN_STRING(PATH_NAME,'.MBR',1)) - 
            (LOCATE_IN_STRING(PATH_NAME,'/',1,4)+1) )
       AS "Member",TEXT_DESCRIPTION
FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS(START_PATH_NAME => 
                           '/QSYS.LIB/RPGPGM1.LIB/DEVSRC.FILE', 
                     OBJECT_TYPE_LIST => '*MBR'))

Notice that the path goes down to the level of the source file. It may seem redundant to have the OBJECT_TYPE_LIST parameter asking for only members as a file can only contain members. If I omit this I get a row returned in the results for the source file itself, which I do not want.

I only want the name of the members, not the names followed by ".MBR". To remove that from the string returned in the results I need to use a substring, but the starting position of where the member names starts will likely vary depending upon the length of the library's and source file's names. This is why I have used the LOCATE_IN_STRING function. The first LOCATE_IN_STRING determines where the fourth slash ( / ) is in the string, this is the separator between the source file name, DEVSRC.FILE, and the member name, I also need to add one to this value as I want to start the first character after the slash. The second and third are used to determine the length of the string I want to return. The second will return the value where the ".MBR" part of the string starts, and the third is where the member name starts, by subtracting these two values I have the length of the string of the member name. This means my results look like:

Member      TEXT_DESCRIPTION
----------  --------------------
AVAILSPACE  View available space
CCSID_CVT   -
DOWNLOAD    -
EG00RR1     Get user space creat
EG001       Create & write to us

As I mentioned above I can see this being really useful for finding those large files hiding deep in subdirectories in the IFS. Using this table function is also an easier way to get a list of files in a particular directory, much easier than using the RTVDSKINF command. At this point I don't think I would use it for retrieving information from the "IBM i environment" as I think there are better tables functions and views to use for that information.

 

You can learn more about the IFS_OBJECT_STATISTICS table function from the IBM website here.

 

This article was written for IBM i 7.4 TR1, and should work for IBM i 7.3 TR7 too.

35 comments:

  1. the solution I have used for years....

    https://www.itjungle.com/2014/08/27/fhg082714-story02/

    ReplyDelete
    Replies
    1. I think we all came up with our own solutions, I did too.

      But now with what I have described above we do not have to use what we used before. We can go directly to the data, rather than use an API, command, etc.

      Just K.I.S.S.

      Delete
  2. Great article Simon - I resolved this using an API - your solution is much simpler.

    ReplyDelete
  3. I’m currently using a qsh command to output listings to a qtemp file to read.
    Works great. Very clean.
    But I’ll take an sql approach any day.

    ReplyDelete
  4. Hi Simon,
    i just tested this on some other LPARs and received *POINTER as path_name. Do you use special job settings?

    ReplyDelete
    Replies
    1. I have only seen *POINTER when I use STRSQL.

      This is another reason to move to ACS and use its "Run SQL Scripts".

      Delete
    2. You can cast the path_name to VARCHAR or CHAR to show the text instead of *POINTER.

      Delete
  5. forget my last comment - i used green screen STRSQL which does not display the paths, it works only on iACS execute SQL function. Maybe nice to note :)

    ReplyDelete
  6. I found the solution in sql's lateral keyword:

    WITH libifs (ifsfullname, ifsoid) AS(
    SELECT
    trim(fullname),
    oid
    FROM
    LIBBES l2
    )
    SELECT
    l.*,
    z.*
    FROM
    libifs i,
    libbes l,
    LATERAL (
    SELECT
    *
    FROM
    TABLE(
    QSYS2.IFS_OBJECT_STATISTICS(
    START_PATH_NAME => ifsfullname,
    SUBTREE_DIRECTORIES => 'NO'
    )
    )
    ) AS z
    WHERE
    i.ifsoid = l.oid

    ReplyDelete
  7. Tried the queries in STRSQL.
    It is throwing a message saying the below:
    Character conversion between CCSID 65535 and CCSID 1200 not valid.

    ReplyDelete
    Replies
    1. In 2021 you should be using ACS's "Run SQL Scripts" instead of STRSQL.
      Without seeing the statement you are trying to use it is hard to make a good suggestion.
      You could try using CAST for that column to your IBM i's partition CCSID.

      Delete
    2. Yes we should be using ACS-RSS. I put this in an SQLRPGLE program and get the same CCSID message.
      I want to put this program/SQL on the job scheduler to create a weekly report. That is why I am putting it in the program.
      So how do I get rid of the CCSID message? Or How do I run it via ACS-RSS from the scheduler?

      I tried several CAST but can not get the syntax correct. I will need to do it on all of the columns also.

      Delete
    3. Look in the IBM documentation (linked to in this post) to identify the three columns that are CCSID 1200.

      This is how I change the CCSID for a column:

      SELECT CAST(PATH_NAME AS CHAR(100) CCSID 37)
      FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS(START_PATH_NAME => '/home',
      SUBTREE_DIRECTORIES =>'YES'))
      LIMIT 10 ;

      Delete
    4. Thanks for the help.
      I am not sure how I missed the CCSID on the IBM documentation. But I did.
      FYI
      I had to cast the ccsid of the Start_path_name and others in the From part.

      FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS( +
      START_PATH_NAME => trim( +
      CAST(''/Home'' AS CHAR(500) CCSID 37)), +
      SUBTREE_DIRECTORIES => CAST(''YES'' as CHAR(3) ccsid 37), +
      OBJECT_TYPE_LIST => cast( +
      ''*ALLDIR *ALLSTMF *NOQDLS *NOQOPT'' as char(35) +
      ccsid 37)))

      Delete
    5. Thank you for the CAST post. You just saved my night

      Delete
  8. I like the iACS solution. Not used it very much. Can you describe how to get recursive folder lists? I want a complete list of all ojbects in the parent folder and all objects in the child folders.

    ReplyDelete
  9. Nevermind about the subfolders list....i missed that section of the article. Thank you for this article. Very helpful.

    ReplyDelete
  10. Is there anyway to use a variable as one of the parameters instead of the actual value? Here is a snippit of my code and im getting an error with the wkpath variable...

    Exec SQL
    INSERT INTO session.PICFILES
    SELECT cast(path_name as VarChar(100)) from table (
    qsys2.ifs_object_statistics(
    start_path_name => wkpath,
    subtree_directories => 'NO'));

    ReplyDelete
    Replies
    1. If you are using variables from a program in a SQL statement then it needs to start with the colon ( : ).

      start_path_name => :wkpath

      Delete
  11. Can you use this to see the IFS on a different LPar.

    ReplyDelete
    Replies
    1. Table functions cannot be used by a remote SQL statement.
      To get around this I build a View over the table function. Then the SQL requests from the remote partitions use the View.

      Delete
  12. Hi Simon, again, great article!

    I'm looking to take this IFS_OBJECT_STATISTICS result set and spin through them in a cursor in my RPG program to delete them one by one. Do you know of a way to use SQL to delete files on the IFS?

    Select path_name
    From table(qsys2.IFS_OBJECT_STATISTICS(
    start_path_name => :pWrkPath))
    Where object_type = '*STMF' and
    create_timestamp < (current_date - 14 Day)

    In my use I just need to tidy up old files in a specific folder.

    ReplyDelete
    Replies
    1. With the current releases and TRs there is not a SQL table function to delete files from the IFS.

      Delete
  13. I've been using Qshell for ages but now I'm using this, much cleaner.

    ReplyDelete
  14. Simon,

    this is really cool stuff ! I struckle with a query. Is there a simple was to query the number of objects (Files) for each directory ? So to get a list of the number of objects in every path or subpath from the start_path_name ?
    Thank you !

    ReplyDelete
    Replies
    1. I would try what I described in this post, and in the example just change line 13 to be greater than zero.

      Delete
    2. Hi Simon, thanks a lot for pointing me to this one ! It's not 100% what I was looking for, but it helped indeed to get the Info I was looking for, although the last example only return the Path_Names, but the Numbers..anyway..it helped ! THX a lot !!

      Delete
  15. And on a lighter note, when the SQL did not work, I tried another library. The first one was defined as TEST, but the other was defined as PROD. The TEST library, AAAABBBB.LIB, would not return anything. But the PROD library, AAAABBBB02.LIB, worked as expected.

    SELECT PATH_NAME,OBJECT_TYPE AS "Type",DATA_SIZE AS "Size", CCSID
    FROM TABLE(QSYS2.IFS_OBJECT_STATISTICS(START_PATH_NAME => '/QSYS.LIB/AAAABBBB02.LIB'))
    ORDER BY DATA_SIZE DESC

    Object link Type Attribute
    AAAABBBB.LIB LIB TEST
    AAAABBBB02.LIB LIB PROD

    ReplyDelete
    Replies
    1. There are better ways of getting the size of library rather than use this way.
      I would check the job log and see if there are any messages there that might explain what happened. My first thought is authorization.

      Delete
  16. How can I code this in a program to get the output in a file that doesn't read *POINTER for every file?

    ReplyDelete
    Replies
    1. I have only seen *POINTER displayed when STRSQL is used,

      Delete
  17. I am using this to email a list of documents. I am trying to filter out 'Thumbs.db'. I have tried using OMIT_LIST => trim(:OmitThumb))) and a where statement (and cast(PATH_NAME as char ccsid 37) <> 'Thumbs.db';) to no avail. Any ideas

    ReplyDelete
    Replies
    1. As none of the path names will equal just 'Thumbs.db' your statement will fail, as you have discovered.

      You will have to use a wildcard, something like:
      WHERE UPPER(PATH_NAME) NOT LIKE '%THUMBS.DB%'

      The UPPER() will make sure that your statement excludes:
      - Thumbs.db
      - thumbs.db
      - Thumbs.Db
      - Etc.

      Delete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.