Tuesday, 1 November 2011

How to write a ranking query in MS Access

One of my little jobs here at work is to produce a report for two year groups of students that ranks their results on three different subjects that they take. The easy way to do this with MS Excel is to use the RANK function operating on a whole set of results and it works out the ranks and it can put the result alongside each student’s name. Going from Excel to Access requires us to write our own ranking function as there is not one built in. This type of function is called an Aggregate (which operates on a whole set of results i.e. all rows of a query, instead of one row at a time as you would normally work on for a custom column) and the query that it is used in is called an Aggregate query. In other words you need to Aggregate together all the data in the column you want to work on (all of the students’ results) and work out a ranking for each row based on that aggregation. Aggregation queries are typically using functions like Sum, Count and so on. Also called a Totals query.

Another way is called the Domain functions in Access. You can write your own custom Domain function. Basically in such a function you will use DAO code to run a subquery on the database. In other words you aggregate all of the data inside that DAO procedure or function and then do the comparison with the value that you were passed as the function argument.

In this case I was able to find out how to do the ranking in MS Access directly using SQL. You can enter this into the grid if you know how, but it is probably easier to type into the SQL box directly and Access will turn this into a grid format. The first thing you need is to create a query or table that contains the data that you want to rank, along with the primary key for each row. The next thing is to create your ranking query. The way you do this is by joining your source dataset onto itself with the Count function. I used as my source this useful page here: http://www.1keydata.com/sql/sql-rank.html
Based on their example my query looks like this:
SELECT OG1.Row, OG1.Data, Count(OG2.Data) AS Rank
FROM [PA Year 10 Maths Overall Grades] AS OG1, [PA Year 10 Maths Overall Grades] AS OG2
WHERE (((OG1.Data)<=[OG2].[Data])) OR (((OG1.Data)=[OG2].[Data]) AND ((OG1.Row)=[OG2].[Row]))
GROUP BY OG1.Row, OG1.Data;
OG1 and OG2 are aliases for the same table. Since we are ripping data out of Musac, this data is in the Cells0 table which stores numerical columns. Row is the primary key (student number).

As it is written above, the main issue is that two equal results get a rank that is the higher value rather than the lower value, e.g. 1,3,3,4 instead of 1,2,2,4 when the 2nd and 3rd ranked grades are equal. Most of the time people would want the second type of output instead of the first type of output. This turned out to be quite a simple fix, and that was to replace <= in the first part of the WHERE clause with < instead.

The next steps we need to look at are generating the word “equal” if two students have the same grade, and generating suffixes to the grade itself so that we can output a line that looks like “Congratulations on a 1st equal place” or “Congratulations on a 3rd place”. You can pass the grade into a simple function on each row to generate the suffix. On the other hand to get an equal, I used to use the CountIf function in Excel. The way to do this in Access is similar to the above:
  • Create a new query
  • Add the above query twice
  • Join the two source query instances on the Rank field
  • Select the Row, Data and Rank columns from the first query instance
  • Change the query to an aggregate and select the Count function on the Rank field of the second query instance as the fourth column. The result will show the count.
All we then have to do is pass this column into a custom function to output another column that states “equal” if the count is greater than 1. As it happens we can create yet another query to include the results of the above query, otherwise it’s a little difficult to pass a results column in this aggregate query into another column’s formula.