Wednesday, October 5, 2016

Looking back into the past of your data with Temporal Tables

system temporal tables

It was during a COMMON presentation discussing the new features of IBM i 7.3 that Steve Will, chief architect of this operating system, mentioned the reason for this release was to accommodate the new Temporal tables. These new tables promise ability to see data within SQL DDL tables as it was in the past. In fact multiple users could use the same table at the same time and see the data as it was at different times, all while the table still be updated with new data.

Have been involved in projects to recreate a "snap shot" of data for a file as it was at a specific time, I can say that it is a lengthy process to roll back changes using data from journals or trigger output files. After those projects I welcome the usefulness of Temporal tables.

Let me start by giving a very high level description of how Temporal tables work. Any changes made to the "master" table are written to a "history" table. When I perform a select and ask for a specific time the operating system is smart enough to work out which rows were active at the time, and return that data set to me. How it does it will become more apparent as I describe how the tables are coded.

What I have found in experimenting with Temporal tables is that both the "master" and the "history" files needs to be journaled. The documentation from IBM fails to mention this. On consulting with someone far wiser than I in the ways of SQL he suggested that I place my Temporal tables in their own Schema. When I create a schema a journal and journal receiver are created within. When I go and create my Temporal table they will be automatically added to journal and receiver.

For those not familiar with SQL-speak a schema is basically the same as a library. When you look for a schema on the IBM i it will be shown as a library. In my experience every command you can use with a library, you can use with a schema too.

So let me create my schema and look what I find in it:

  CREATE SCHEMA MYSCHEMA

  WRKOBJPDM MYSCHEMA

What I see in my new schema is:

                         Work with Objects Using PDM
Library . . . . .   MYSCHEMA       Position to . . . . . . . . .
                                   Position to type  . . . . . .

Opt  Object      Type        Attribute   Text
       QSQJRN0001  *JRNRCV                 COLLECTION - created by
       QSQJRN      *JRN                    COLLECTION - created by
       SYSCHKCST   *FILE       LF          SQL catalog view
       SYSCOLUMNS  *FILE       LF          SQL catalog view
       SYSCST      *FILE       LF          SQL catalog view
       SYSCSTCOL   *FILE       LF          SQL catalog view
       SYSCSTDEP   *FILE       LF          SQL catalog view

The first two objects are the journal and journal receiver that was created for me. The rest of the objects are a collection of the same system views that are available in the library QSYS2. Personally whenever I find them in my schema I delete them.

  DLTF MYSCHEMA/SYS*

Now I have my schema I can create my Temporal table. In this example I am going to create table called PERSON, which will contains people's names. The definition of this table and its matching "history" table are defined in one source member. To make it easier for me to explain what is going on I am going to break the code up into parts. I am going to start with the basic definition of the "master" table.

01  CREATE TABLE MYSCHEMA.PERSON (
02    PERSON_ID FOR UNIQUE NUMERIC (10,0)
03            GENERATED ALWAYS AS IDENTITY
04            (START WITH 1, INCREMENT BY 1, NOCYCLE),
05    FIRST_NAME VARCHAR(25),
06    MID_INITIAL CHAR(1),
07    LAST_NAME VARCHAR(30),
08    BEGIN_TS TIMESTAMP(12) NOT NULL
09                 GENERATED ALWAYS AS ROW BEGIN,
10    END_TS TIMESTAMP(12) NOT NULL
11               GENERATED ALWAYS AS ROW END,
12    TS_ID TIMESTAMP(12) NOT NULL
13              GENERATED ALWAYS AS TRANSACTION START ID,
14    PERIOD SYSTEM_TIME (BEGIN_TS,END_TS)
15  ) ;

Line 1: Every table is created using the CREATE TABLE statement. I am using SQL syntax, therefore, the schema (library) name is separated from the table (file) name by a dot.

Lines 2 – 4: The first column (field) in the table is PERSON_ID. I am making this an identity column. The value of the identity is automatically generated as a unique number, thereby, guaranteeing that it can be used as a unique key for the table.

Line 5: This column is for the first name. I decided to make this VARCHAR rather than CHAR as not everyone's first name needs 25 characters.

Line 6: Middle initial can be CHAR as it can only be one character maximum.

Line 7: Last name I have made a VARCHAR of a maximum of 30 characters.

The following columns have to be present for the Temporal table to work. The names I have used for these columns are not important, it is the keywords that follow that are.

Line 8 and 9: This field is used to indicate when the record became "active". It must be code as a TIMESTAMP(12). The GENERATED ALWAYS AS ROW BEGIN must be present and explains exactly what this column is used for.

Line 10 and 11: If you have a column to say when the record became "active" you need another to say when it became "inactive". As with the previous field it must be TIMESTAMP(12) and must have GENERATED ALWAYS AS ROW END to indicate its purpose.

Line 12 and 13: If several rows were inserted, changed, or deleted by the same statement all of rows that were will have the same value in this column. This allows me to identify all the records that were changed by that one statement. If a row is the only one to be inserted, changed, or deleted then its value will be the same as the row begin column.

Line 14: Shows the time period the row was "active".

The next part of my code is me giving my columns decent column headings. For more information about this subject see the post Changing column headings in output file.

16  LABEL ON COLUMN MYSCHEMA.PERSON (
17    PERSON_ID     IS 'Person              id',
18    FIRST_NAME    IS 'First               name',
19    MID_INITIAL   IS 'Middle              initial',
20    LAST_NAME     IS 'Last                name',
21    BEGIN_TS      IS 'Row begin           timestamp',
22    END_TS        IS 'Row end             timestamp',
23    TS_ID         IS 'Timestamp           id'
24  ) ;

In the next part I create the "history" table and establish the relationship between the "master" and the "history".

25  CREATE TABLE MYSCHEMA.PERSON_H LIKE MYSCHEMA.PERSON ;

26  ALTER TABLE MYSCHEMA.PERSON ADD VERSIONING USE
27              HISTORY TABLE MYSCHEMA.PERSON_H ;

Line 25: This is a nice thing you can do in SQL to create a copy of a table. In this case the table PERSON is copied to create the table PERSON_H.

Line 26 and 27: This defines the "versioning" relationship between the two tables. Now whenever a row is updated or deleted the original will be versioned into the "history" table.

The last two lines just give the two tables system names that will be displayed when I look at them on the IBM i.

28  LABEL ON TABLE MYSCHEMA.PERSON IS 'PERSON temporal table' ;
29  LABEL ON TABLE MYSCHEMA.PERSON_H 
                IS 'PERSON_H temporal table' ;

So now I need to add, change, and delete data in the "master" file. I am going to use a SQL RPG program to do this, I could use a CL program and the RUNSQL command, or just plain STRSQL.

01  ctl-opt dftactgrp(*no) ;

02  dcl-pr sleep extproc('sleep') ;
03    *n uns(10) value ;
04  end-pr ;

05  exec sql SET OPTION COMMIT = *NONE ;

06  exec sql INSERT INTO PERSON
                (FIRST_NAME,MID_INITIAL,LAST_NAME)
                VALUES('ANNA','','CRUZ') ;
07  sleep(5) ;

08  exec sql INSERT INTO PERSON
                VALUES(DEFAULT,'JHON','D','SMITH',
                       DEFAULT,DEFAULT,DEFAULT) ;
09  sleep(5) ;

10  exec sql INSERT INTO PERSON
                (FIRST_NAME,MID_INITIAL,LAST_NAME)
                VALUES('DARNESHA','A','GRAY') ;
11  sleep(5) ;

12  exec sql INSERT INTO PERSON
                (FIRST_NAME,MID_INITIAL,LAST_NAME)
                VALUES('RANJIT','','GUPTA') ;
13  sleep(5) ;

14  exec sql UPDATE PERSON
            SET LAST_NAME = 'HERNANDEZ'
            WHERE PERSON_ID = 1 ;
15  sleep(5) ;

16  exec sql UPDATE PERSON
            SET FIRST_NAME = 'JOHN'
          WHERE PERSON_ID = 2 ;
17  sleep(5) ;

18  exec sql DELETE FROM PERSON
          WHERE PERSON_ID = 4 ;

Line 1: I need the DFTACTGRP(*NO) control option as I am going to use a external procedure.

Lines 2 – 4: And this is the external procedure. sleep() does the same as the DLYJOB command, pauses the program for the number of seconds it is passed. Most examples of sleep() include a returned parameter, frankly I cannot be bothered with it as I don't need it. There is only one parameter: the number of seconds, which needs to be passed as an unsigned integer.

Line 5: As I do not want to use commitment control with my SQL statements I am turning it "off" within my program. For more information about what other options you can use this statement for see Putting the SQL options into the source.

Line 6: First insert statement. In this statement I am only using the columns I need to update. The PERSON_ID identity, and temporal columns, BEGIN_TS, END_TS, and SYSTEM_TIME are generated for me.

Line 7: I am going to sleep for 5 seconds.

Line 8: This another way I could code the insert statement. In this scenario I do not give a list of columns I want to update, it is assumed I want to update all of them. The values I wanting to insert into the identity and temporal columns are to be the default values for these columns, therefore, I use the value DEFAULT. More on using DEFAULT can be found in Easy way to initialize columns when using SQL insert. Notice how I have misspelled this person's first name.

Line 9: Again I sleep.

Lines 10 – 13: The above steps are repeated for two more individuals.

Line 14: Anna Cruz has just informed me that she wants to use her married last name of Hernandez. Therefore, I need to update the last name of the person with the PERSON_ID of 1.

Line 16: Oops I noticed how I misspelled John's first name so I need to correct that.

Line 18: Ranjit has left, therefore, I need to delete his details from the table.

When this program is finished the data in PERSON looks like:

         Person   First                      Middle   Last
         id       name                       initial  name
1             1   ANNA                                HERNANDEZ
2             2   JOHN                          D     SMITH
3             3   DARNESHA                     A     GRAY

  Row begin                         Row end
  timestamp                         timestamp
1 2016-09-30-03.37.21.964027000244  9999-12-30-00.00.00.000000000000
2 2016-09-30-03.37.27.016426000244  9999-12-30-00.00.00.000000000000
3 2016-09-30-03.37.11.888739000244  9999-12-30-00.00.00.000000000000


  Timestamp
  id
1 2016-09-30-03.37.21.964027000244
2 2016-09-30-03.37.27.016426000244
3 2016-09-30-03.37.11.888739000244

I find it interesting that the end timestamp is only December 30, 9999, and not December 31, 9999.

The "history" file, PERSON_H, contains:

         Person   First                      Middle   Last
         id       name                       initial  name
1             1   ANNA                                CRUZ
2             2   JHON                          D     SMITH
3             4   RANJIT                              GUPTA

   Row begin                         Row end
   timestamp                         timestamp
1 2016-09-30-03.37.01.815270000244  2016-09-30-03.37.21.964027000244
2 2016-09-30-03.37.06.852713000244  2016-09-30-03.37.27.016426000244
3 2016-09-30-03.37.16.924815000244  2016-09-30-03.37.32.056313000244

  Timestamp
  id
1 2016-09-30-03.37.01.815270000244
2 2016-09-30-03.37.06.852713000244
3 2016-09-30-03.37.16.924815000244

Here I can see the time the rows were changed or deleted by the values in the row end column.

Now let me start my time travelling. I want to see what was "active" at 3:37:10 AM:

SELECT LAST_NAME,FIRST_NAME FROM PERSON
   FOR SYSTEM_TIME AS OF '2016-09-30-03.37.10.000000000000'


Last                            First
name                            name
CRUZ                            ANNA
SMITH                           JHON
********  End of data  ********

At that time only Anna's and John's rows were in the table.

Seven seconds later:

SELECT LAST_NAME,FIRST_NAME FROM PERSON
   FOR SYSTEM_TIME AS OF '2016-09-30-03.37.17.000000000000'


Last                            First
name                            name
GRAY                            DARNESHA
CRUZ                            ANNA
SMITH                           JHON
GUPTA                           RANJIT
********  End of data  ********

And as I slowly move through time...

SELECT LAST_NAME,FIRST_NAME FROM PERSON
   FOR SYSTEM_TIME AS OF '2016-09-30-03.37.22.000000000000'


Last                            First
name                            name
HERNANDEZ                       ANNA
GRAY                            DARNESHA
SMITH                           JHON
GUPTA                           RANJIT
********  End of data  ********

I can watch the data change...

SELECT LAST_NAME,FIRST_NAME FROM PERSON
   FOR SYSTEM_TIME AS OF '2016-09-30-03.37.28.000000000000'

Last                            First
name                            name
HERNANDEZ                       ANNA
SMITH                           JOHN
GRAY                            DARNESHA
GUPTA                           RANJIT
********  End of data  ********

Until it catches up to the present time:

SELECT LAST_NAME,FIRST_NAME FROM PERSON
   FOR SYSTEM_TIME AS OF '2016-09-30-03.37.33.000000000000'

Last                            First
name                            name
HERNANDEZ                       ANNA
SMITH                           JOHN
GRAY                            DARNESHA
********  End of data  ********

Another thing I can do is to give a range of dates and want to see what data changes happened during that time.

SELECT LAST_NAME,FIRST_NAME FROM PERSON
   FOR SYSTEM_TIME BETWEEN '2016-09-30-03.37.20.000000000000'
                   AND'2016-09-30-03.37.27.00000000000'
 ORDER BY FIRST_NAME


Last                            First
name                            name
HERNANDEZ                       ANNA
CRUZ                            ANNA
GRAY                            DARNESHA
SMITH                           JHON
GUPTA                           RANJIT
********  End of data  ********

If I use the between times be careful how I use the data. Anna's row was inserted and then changed within the time range, therefore, her details appear twice.

I can see many uses for Temporal tables. Now after the daily, weekly, monthly or yearly updates have happened the users can return to using the applications, and by using Temporal tables I can run my reports without worrying that the users are changing the data I need for my reports. Or I can now run a backlog report for last week, without having to restore a copy of the backlog file. More uptime for the systems equals happier management.

 

You can learn more about this from the IBM website:

 

This article was written for IBM i 7.3.

7 comments:

  1. as far as i can remember the recommendation from Scott Forstie is to put the tables & history tables into the same schema. its easier to handle this way (save, restore, authority, ...)

    ReplyDelete
    Replies
    1. Have to agree with tables & history in same schema, makes life a lot easier

      Delete
  2. and another important thing: you cant use foreign keys with set default/cascade/set null and temporal tables.

    only "restrict" works...

    ReplyDelete
  3. This is one of my arguments for convincing managers to 7.3 migration

    ReplyDelete
  4. The views created by system i in the collection's library are't created for fun but for improved the performence of SQL engine. Deleting these files are negative impact on system performance

    ReplyDelete
  5. I don't think the views in the schema library provide much in the way of performance. Their purpose is to provide the same views as in QSYS2, but with WHERE clauses that limit the results to only objects in that Schema. Database engines like Oracle and DB2 for Z have schemas, but each is isolated from others with their own catalogs. On IBM i, you can treat the entire system as one big schema (with catalogs in QSYS2) or you can have the limited schema view by using catalogs in the Schema library. So, perhaps if you query for columns named MyColumn, you'll only get those in the schema versus those across the entire system, so to that extent it could affect performance. But not for performance of normal insert/update/delete activity on the tables.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.