Wednesday, September 18, 2019

Using SQL to rank results

ranking results using sql rank()

When I wrote about using a SQL Sequence to place sequential values in a column I used the example of ranking students based upon on their test results. Looking back this may not have been the best example, as if two students had the same score they would have been given a different sequence number.

Fortunately SQL does have a function I can use to properly rank these students, based on their test scores.

Let me start with the students and their test scores:

LAST_NAME  FIRST_NAME  MID_INITIAL  SCORE
---------  ----------  -----------  -----
FUESTER    MAX                      25.69
ECKERT     TRISTAN          M       49.86
APUZZO     CAROL            C       14.89
TORRES     JOSE                     94.96
PENA       JUANITA          M       74.09
HONG       XIA                      18.54
REED       STEVE            J       65.83
OKEEFE     VINCENT                   6.02
KING       CATHERINA        K       18.76
WILEY      CHRISTIAN        J       60.77
MORALES    MARIA            A       65.83

Steve Reed and Maria Morales have the same test score, therefore, they need to have the same ranking. Let me show the SQL statement I would use to rank these eleven students by their score.

01  SELECT
02    RTRIM(LAST_NAME) || ',' || RTRIM(FIRST_NAME) || 
        ' ' || MID_INITIAL as "Student name",
03    SCORE AS "Test score",
04    RANK() OVER(ORDER BY SCORE DESC) AS "Rank"
05    FROM MYLIB.STUDENT

Line 2: I am concatenating the parts of the students' name into one column. Here I need to use the RTRIM to remove the trailing blanks from the last and first name columns. The double pipes ( || ) are used to concatenate these parts into one string.

Line 4: Here is the rank function. It has two parts, RANK() followed by the OVER. The OVER tells the rank how to sort and rank the columns in the results. In this case I want the results ranked (sorted) by the Score column in descending order. This will sort the results into this order, and return an integer value.

The results look like:

Student name       Test score   Rank
----------------   ----------   ----
TORRES,JOSE             94.96      1
PENA,JUANITA M          74.09      2
REED,STEVE J            65.83      3
MORALES,MARIA A         65.83      3
WILEY,CHRISTIAN J       60.77      5
ECKERT,TRISTAN M        49.86      6
FUESTER,MAX             25.69      7
KING,CATHERINA K        18.76      8
HONG,XIA                18.54      9
APUZZO,CAROL C          14.89     10
OKEEFE,VINCENT           6.02     11

We can see the Steve Reed and Maria Morales are both rank in third place. Christian Wiley is ranked as fifth as with two third rankings there is no fourth.

What if I want to have a fourth rank, no matter how many third ranks I have. To achieve this I would use the DENSE_RANK() function, as shown below:

01  SELECT
02    RTRIM(LAST_NAME) || ',' || RTRIM(FIRST_NAME) || 
        ' ' || MID_INITIAL as "Student name",
03    SCORE AS "Test score",
04    DENSE_RANK() OVER(ORDER BY SCORE DESC) AS "Rank"
05    FROM MYLIB.STUDENT

Line 4: The DENSE_RANK() replaces the RANK() that was used in the previous statement.

Steve Reed and Maria Morales are still ranked as third, and Christian Wiley is now ranked as number 4.

Student name       Test score   Rank
----------------   ----------   ----
TORRES,JOSE             94.96      1
PENA,JUANITA M          74.09      2
REED,STEVE J            65.83      3
MORALES,MARIA A         65.83      3
WILEY,CHRISTIAN J       60.77      4
ECKERT,TRISTAN M        49.86      5
FUESTER,MAX             25.69      6
KING,CATHERINA K        18.76      7
HONG,XIA                18.54      8
APUZZO,CAROL C          14.89      9
OKEEFE,VINCENT           6.02     10

When using RANK() and DENSE_RANK() I can have multiple values in the OVER, as this more complex examples shows.

There was a soccer tournament with four teams. They played each other twice, once at home and once away. At the end of the tournament the teams had the following records:

TEAM      WINS  DRAWS  LOSSES  GOALS_FOR  GOALS_AGAINST
--------  ----  -----  ------  ---------  -------------
ATHLETIC     1      0       5         4             12
CITY         3      2       1         6              4
ROVERS       1      1       4         5              8
UNITED       5      1       0        15              6

To determine the tournament winner I need to calculate the number of points each team won, and the goal difference, difference between the goals for and against. I can use the following statement to generate a table that lists all of teams ranked by the total points, and goal difference if two teams have the same number of points.

01  SELECT TEAM AS "Team",
02         (WINS + DRAWS + LOSSES) AS "Pl",
03         WINS AS "W",
04         DRAWS AS "D",
05         LOSSES AS "L",
06         GOALS_FOR AS "GF",
07         GOALS_AGAINST AS "GA",
08         (GOALS_FOR - GOALS_AGAINST) AS "GD",
09         ((WINS * 3) + DRAWS) AS "Pts",
10         RANK() OVER(ORDER BY ((WINS * 3) + DRAWS) DESC,
              (GOALS_FOR - GOALS_AGAINST) DESC) AS "Rank"
11    FROM MYLIB.LEAGUE

Line 1: First columns is the team's name.

Line 2: Next comes the number of games played, which is calculated and the sum number of games each team won, drew, and lost, Pl.

Line 3: Number of games the team won, W.

Line 4: Number of games the team drew (or tied), D.

Line 5: Number of game the team lost, L.

Line 6: Number of goals the team scored, GF.

Line 7: Number of goals the team conceded, GA.

Line 8: The goal difference is the difference between the number of goals the team scored and the number they conceded, GD.

Line 9: The three points are awarded for each win, and one point for a draw, Pts.

Line 10: The OVER part of this rank may look complicated, but it is not. The first thing to use to rank the results is the number of points, and the second is the goal difference.

The tournament winners are United.

Team      Pl  W  D  L  GF  GA  GD  Pts  Rank
--------  --  -  -  -  --  --  --  ---  ----
UNITED     6  5  1  0  15   6   9   16     1
CITY       6  3  2  1   6   4   2   11     2
ROVERS     6  1  1  4   5   8  -3    4     3
ATHLETIC   6  1  0  5   4  12  -8    3     4

Just to prove that the second part of the OVER works I inserted a bogus team, BOGUS, into the results with the same number of wins, draws, and losses as Athletic but Bogus have scored two more goals.

TEAM      WINS  DRAWS  LOSSES  GOALS_FOR  GOALS_AGAINST
--------  ----  -----  ------  ---------  -------------
BOGUS        1      0       5          6             12

Now when I run the same SQL statement as before I can see that Bogus is ranked above Athletic. They both have three points, and Bogus has the better goal difference.

Team      Pl  W  D  L  GF  GA  GD  Pts  Rank
--------  --  -  -  -  --  --  --  ---  ----
UNITED     6  5  1  0  15   6   9   16     1
CITY       6  3  2  1   6   4   2   11     2
ROVERS     6  1  1  4   5   8  -3    4     3
BOGUS      6  1  0  5   6  12  -6    3     4
ATHLETIC   6  1  0  5   4  12  -8    3     5

If you need rank results now you need to decide whether to use RANK() or DENSE_RANK().

 

You can learn more about the RANK() and DENSE_RANK() command from the IBM website here.

 

This article was written for IBM i 7.3, and should work for releases 7.1 and greater.

1 comment:

  1. Used this recently as the reference for both rank and denseRank. Clear, meaningful article. Thanks again.

    ReplyDelete

To prevent "comment spam" all comments are moderated.
Learn about this website's comments policy here.

Some people have reported that they cannot post a comment using certain computers and browsers. If this is you feel free to use the Contact Form to send me the comment and I will post it for you, please include the title of the post so I know which one to post the comment to.