The advantage of Oracle analytic functions

enter image description here

I am convinced that seeing - among others - MIN, MAX, SUM or COUNT would remind you first of SQL aggregate functions. However, there is a magnificent feature in Oracle: the analytic functions. My experience is that once you write your first one, you just can’t stop using them.

So, what are the analytic functions? How are they different to the aggregate functions? Well, the difference is not in the result, but in the way the result is generated. While aggregate functions return one record for a set of rows (one group), the analytic functions return multiple rows. Basically, they return the value for all the rows in a group. But, as someone who prefers practical examples over theories, let’s check how it actually works.

Throughout my article I will use the table below. The database normalization rules are not met, nor do we have IDs, however, being a simple table it can illustrate the functionality well.

Script to create the table:

create table beer_sales (country, shop, beer, price, annual_sale) as
  select 'SPAIN', 'AUCHAN', 'ALHAMBRA',   0.89,  800000 from dual union all
  select 'SPAIN', 'AUCHAN', 'ESTRELLA',   0.98, 1300000 from dual union all
  select 'SPAIN', 'AUCHAN', 'CRUZ_CAMPO', 1.10,  950000 from dual union all
  select 'SPAIN', 'AUCHAN', 'SAN_MIGUEL', 1.25, 1950000 from dual union all
  select 'SPAIN', 'LIDL',   'ALHAMBRA',   0.93,  850000 from dual union all
  select 'SPAIN', 'LIDL',   'ESTRELLA',   0.98, 1250000 from dual union all
  select 'SPAIN', 'LIDL',   'CRUZ_CAMPO', 1.09,  930000 from dual union all
  select 'SPAIN', 'LIDL',   'SAN_MIGUEL', 1.30, 1500000 from dual union all
  
  select 'ITALY', 'AUCHAN', 'ALHAMBRA',   0.85,  300000 from dual union all
  select 'ITALY', 'AUCHAN', 'ESTRELLA',   0.90,  450000 from dual union all
  select 'ITALY', 'AUCHAN', 'CRUZ_CAMPO', 1.20,  650000 from dual union all
  select 'ITALY', 'AUCHAN', 'SAN_MIGUEL', 1.25, 1050000 from dual union all
  select 'ITALY', 'LIDL',   'ALHAMBRA',   0.90,  200000 from dual union all
  select 'ITALY', 'LIDL',   'ESTRELLA',   0.90,  440000 from dual union all
  select 'ITALY', 'LIDL',   'CRUZ_CAMPO', 1.19,  800000 from dual union all
  select 'ITALY', 'LIDL',   'SAN_MIGUEL', 1.29, 1150000 from dual;

Comparing aggregate and analytic functions

Now that we have our table in place, let me explain it by means of the following example. Supposing that we want to know the minimum beer price for each country, this is how the queries would look like using aggregate and analytic functions.

Aggregate function:

select country,
       min(price)
from beer_sales
group by country
order by country;

Result:

|COUNT |MIN(PRICE)|
|------|----------|
|ITALY |      0.85|
|SPAIN |      0.89|

Analytic function:

select country, shop, beer, price,
       min(price) over (partition by country) as min_coun_price,
       annual_sale
from beer_sales
order by country, shop, beer;

Result:

COUNTRY |SHOP   |BEER       |     PRICE |MIN_COUN_PRICE |ANNUAL_SALE |
--------|-------|-----------|-----------|---------------|------------|
ITALY   |AUCHAN |ALHAMBRA   |      0.85 |          0.85 |     300000 |
ITALY   |AUCHAN |CRUZ_CAMPO |       1.2 |          0.85 |     650000 |
ITALY   |AUCHAN |ESTRELLA   |       0.9 |          0.85 |     450000 |
ITALY   |AUCHAN |SAN_MIGUEL |      1.25 |          0.85 |    1050000 |
ITALY   |LIDL   |ALHAMBRA   |       0.9 |          0.85 |     200000 |
ITALY   |LIDL   |CRUZ_CAMPO |      1.19 |          0.85 |     800000 |
ITALY   |LIDL   |ESTRELLA   |       0.9 |          0.85 |     440000 |
ITALY   |LIDL   |SAN_MIGUEL |      1.29 |          0.85 |    1150000 |
SPAIN   |AUCHAN |ALHAMBRA   |      0.89 |          0.89 |     800000 |
SPAIN   |AUCHAN |CRUZ_CAMPO |       1.1 |          0.89 |     950000 |
SPAIN   |AUCHAN |ESTRELLA   |      0.98 |          0.89 |    1300000 |
SPAIN   |AUCHAN |SAN_MIGUEL |      1.25 |          0.89 |    1950000 |
SPAIN   |LIDL   |ALHAMBRA   |      0.93 |          0.89 |     850000 |
SPAIN   |LIDL   |CRUZ_CAMPO |      1.09 |          0.89 |     930000 |
SPAIN   |LIDL   |ESTRELLA   |      0.98 |          0.89 |    1250000 |
SPAIN   |LIDL   |SAN_MIGUEL |       1.3 |          0.89 |    1500000 |

As we can see, the aggregate function returns only one record per each group. If I wanted to add for example the shop and/or beer to the selected columns, I would get an error saying not a GROUP BY expression. The analytic function, as opposed to this, returns the value for all selected records. Meaning, I can select any column from the table, the minimum price of the country will be returned for each record.

The structure of analytic functions

After having checked the example above, let’s examine the different clauses of analytic functions:

-- returns the minimum price grouped by country
min(price) over (partition by country)

-- returns the rank value grouped by country and ordered by price
rank() over (partition by country order by price)
  • min(price) : the required function
  • partition by : this clause is responsible for grouping (group by)
  • order by : as one might guess, this clause is responsible for ordering the rows within the specified group

If you want to dig deeper, it is worth checking the windowing clause that gives us the possibility of specifying filters. These constraints are applied for each record.

Examples

Let’s check a few more examples with the most common functions so that we can understand the logic easily.

Sum of the annual sales in different groupings:

select country, shop, beer, price, annual_sale,
       sum(annual_sale) over (partition by country, shop) sum_shop_sales,
       sum(annual_sale) over (partition by country) sum_country_sales
from beer_sales
order by country, shop, beer;

Result:

COUNTRY |SHOP   |BEER       |    PRICE |ANNUAL_SALE |SUM_SHOP_SALES |SUM_COUNTRY_SALES|
--------|-------|-----------|----------|------------|---------------|-----------------|
ITALY   |AUCHAN |ALHAMBRA   |     0.85 |     300000 |       2450000 |          5040000|
ITALY   |AUCHAN |CRUZ_CAMPO |      1.2 |     650000 |       2450000 |          5040000|
ITALY   |AUCHAN |ESTRELLA   |      0.9 |     450000 |       2450000 |          5040000|
ITALY   |AUCHAN |SAN_MIGUEL |     1.25 |    1050000 |       2450000 |          5040000|
ITALY   |LIDL   |ALHAMBRA   |      0.9 |     200000 |       2590000 |          5040000|
ITALY   |LIDL   |CRUZ_CAMPO |     1.19 |     800000 |       2590000 |          5040000|
ITALY   |LIDL   |ESTRELLA   |      0.9 |     440000 |       2590000 |          5040000|
ITALY   |LIDL   |SAN_MIGUEL |     1.29 |    1150000 |       2590000 |          5040000|
SPAIN   |AUCHAN |ALHAMBRA   |     0.89 |     800000 |       5000000 |          9530000|
SPAIN   |AUCHAN |CRUZ_CAMPO |      1.1 |     950000 |       5000000 |          9530000|
SPAIN   |AUCHAN |ESTRELLA   |     0.98 |    1300000 |       5000000 |          9530000|
SPAIN   |AUCHAN |SAN_MIGUEL |     1.25 |    1950000 |       5000000 |          9530000|
SPAIN   |LIDL   |ALHAMBRA   |     0.93 |     850000 |       4530000 |          9530000|
SPAIN   |LIDL   |CRUZ_CAMPO |     1.09 |     930000 |       4530000 |          9530000|
SPAIN   |LIDL   |ESTRELLA   |     0.98 |    1250000 |       4530000 |          9530000|
SPAIN   |LIDL   |SAN_MIGUEL |      1.3 |    1500000 |       4530000 |          9530000|

As illustrated in the example above, one of the huge advantages of the analytic functions is that we can display aggregated values for different groupings in the same record. This can give us an enormous help in some certain situations.

Without using the analytic functions, we would have to create tables or views for the aggregated values and then join them with the original table on country and country / shop, we could use subqueries as tables, or we could use the with clause to define tables. I chose the latter to show an example with:

with scs as (
    select country, shop, 
           sum(annual_sale) sum_shop_sales
    from beer_sales
    group by country, shop
),
sc as (
    select country,
           sum(annual_sale) sum_country_sales
    from beer_sales
    group by country
)
select bs.country, bs.shop, bs.beer, bs.price, bs.annual_sale,
       scs.sum_shop_sales,
       sc.sum_country_sales
from beer_sales bs, scs, sc
where bs.country = scs.country
  and bs.shop = scs.shop
  and bs.country = sc.country
order by country, shop, beer;

The result is exactly the same as above, but we can clearly see the advantage of using the analytic function in this case.

Some more examples.

Minimum and maximum prices for different groupings:

select country, shop, beer, annual_sale, price,
       min(price) over (partition by country, shop) min_shop_price,
       max(price) over (partition by country, shop) max_shop_price,
       min(price) over (partition by country) min_country_price,
       max(price) over (partition by country) max_country_price
from beer_sales
order by country, shop, beer;

Result:

COUNTRY |SHOP   |BEER       |ANNUAL_SALE |     PRICE |MIN_SHOP_PRICE |MAX_SHOP_PRICE |MIN_COUNTRY_PRICE |MAX_COUNTRY_PRICE|
--------|-------|-----------|------------|-----------|---------------|---------------|------------------|-----------------|
ITALY   |AUCHAN |ALHAMBRA   |     300000 |      0.85 |          0.85 |          1.25 |             0.85 |             1.29|
ITALY   |AUCHAN |CRUZ_CAMPO |     650000 |       1.2 |          0.85 |          1.25 |             0.85 |             1.29|
ITALY   |AUCHAN |ESTRELLA   |     450000 |       0.9 |          0.85 |          1.25 |             0.85 |             1.29|
ITALY   |AUCHAN |SAN_MIGUEL |    1050000 |      1.25 |          0.85 |          1.25 |             0.85 |             1.29|
ITALY   |LIDL   |ALHAMBRA   |     200000 |       0.9 |           0.9 |          1.29 |             0.85 |             1.29|
ITALY   |LIDL   |CRUZ_CAMPO |     800000 |      1.19 |           0.9 |          1.29 |             0.85 |             1.29|
ITALY   |LIDL   |ESTRELLA   |     440000 |       0.9 |           0.9 |          1.29 |             0.85 |             1.29|
ITALY   |LIDL   |SAN_MIGUEL |    1150000 |      1.29 |           0.9 |          1.29 |             0.85 |             1.29|
SPAIN   |AUCHAN |ALHAMBRA   |     800000 |      0.89 |          0.89 |          1.25 |             0.89 |              1.3|
SPAIN   |AUCHAN |CRUZ_CAMPO |     950000 |       1.1 |          0.89 |          1.25 |             0.89 |              1.3|
SPAIN   |AUCHAN |ESTRELLA   |    1300000 |      0.98 |          0.89 |          1.25 |             0.89 |              1.3|
SPAIN   |AUCHAN |SAN_MIGUEL |    1950000 |      1.25 |          0.89 |          1.25 |             0.89 |              1.3|
SPAIN   |LIDL   |ALHAMBRA   |     850000 |      0.93 |          0.93 |           1.3 |             0.89 |              1.3|
SPAIN   |LIDL   |CRUZ_CAMPO |     930000 |      1.09 |          0.93 |           1.3 |             0.89 |              1.3|
SPAIN   |LIDL   |ESTRELLA   |    1250000 |      0.98 |          0.93 |           1.3 |             0.89 |              1.3|
SPAIN   |LIDL   |SAN_MIGUEL |    1500000 |       1.3 |          0.93 |           1.3 |             0.89 |              1.3|

Number of beer brands sold in different groupings:

select country, shop, beer, price, annual_sale,
       count(beer) over (partition by country, shop) nb_neer_shop,
       count(beer) over (partition by country) nb_beer_country
from beer_sales
order by country, shop, beer;

Result:

COUNTRY |SHOP   |BEER       |     PRICE |ANNUAL_SALE |NB_NEER_SHOP |NB_BEER_COUNTRY|
--------|-------|-----------|-----------|------------|-------------|---------------|
ITALY   |AUCHAN |ALHAMBRA   |      0.85 |     300000 |           4 |              8|
ITALY   |AUCHAN |CRUZ_CAMPO |       1.2 |     650000 |           4 |              8|
ITALY   |AUCHAN |ESTRELLA   |       0.9 |     450000 |           4 |              8|
ITALY   |AUCHAN |SAN_MIGUEL |      1.25 |    1050000 |           4 |              8|
ITALY   |LIDL   |ALHAMBRA   |       0.9 |     200000 |           4 |              8|
ITALY   |LIDL   |CRUZ_CAMPO |      1.19 |     800000 |           4 |              8|
ITALY   |LIDL   |ESTRELLA   |       0.9 |     440000 |           4 |              8|
ITALY   |LIDL   |SAN_MIGUEL |      1.29 |    1150000 |           4 |              8|
SPAIN   |AUCHAN |ALHAMBRA   |      0.89 |     800000 |           4 |              8|
SPAIN   |AUCHAN |CRUZ_CAMPO |       1.1 |     950000 |           4 |              8|
SPAIN   |AUCHAN |ESTRELLA   |      0.98 |    1300000 |           4 |              8|
SPAIN   |AUCHAN |SAN_MIGUEL |      1.25 |    1950000 |           4 |              8|
SPAIN   |LIDL   |ALHAMBRA   |      0.93 |     850000 |           4 |              8|
SPAIN   |LIDL   |CRUZ_CAMPO |      1.09 |     930000 |           4 |              8|
SPAIN   |LIDL   |ESTRELLA   |      0.98 |    1250000 |           4 |              8|
SPAIN   |LIDL   |SAN_MIGUEL |       1.3 |    1500000 |           4 |              8|

Ranking methods

In the following examples I will “play” with the row numbers. Using them can turn out to be quite handy in certain cases, for instance when we need to filter in groups.

In the examples ahead I will query the rank / row numbers grouped by country and ordered by price, ascending.

Using rank()

This will eventually give us the rank value of the row, considering ties as equal ranks:

select country, shop, beer, price, annual_sale,
       rank() over (partition by country order by price) rn
from beer_sales
order by country, rn;

Result:

COUNTRY |SHOP   |BEER       |     PRICE |ANNUAL_SALE |        RN|
--------|-------|-----------|-----------|------------|----------|
ITALY   |AUCHAN |ALHAMBRA   |       .85 |     300000 |         1|
ITALY   |AUCHAN |ESTRELLA   |        .9 |     450000 |         2|
ITALY   |LIDL   |ALHAMBRA   |        .9 |     200000 |         2|
ITALY   |LIDL   |ESTRELLA   |        .9 |     440000 |         2|
ITALY   |LIDL   |CRUZ_CAMPO |      1.19 |     800000 |         5|
ITALY   |AUCHAN |CRUZ_CAMPO |       1.2 |     650000 |         6|
ITALY   |AUCHAN |SAN_MIGUEL |      1.25 |    1050000 |         7|
ITALY   |LIDL   |SAN_MIGUEL |      1.29 |    1150000 |         8|
SPAIN   |AUCHAN |ALHAMBRA   |       .89 |     800000 |         1|
SPAIN   |LIDL   |ALHAMBRA   |       .93 |     850000 |         2|
SPAIN   |LIDL   |ESTRELLA   |       .98 |    1250000 |         3|
SPAIN   |AUCHAN |ESTRELLA   |       .98 |    1300000 |         3|
SPAIN   |LIDL   |CRUZ_CAMPO |      1.09 |     930000 |         5|
SPAIN   |AUCHAN |CRUZ_CAMPO |       1.1 |     950000 |         6|
SPAIN   |AUCHAN |SAN_MIGUEL |      1.25 |    1950000 |         7|
SPAIN   |LIDL   |SAN_MIGUEL |       1.3 |    1500000 |         8|

Using dense_rank()

The result of dense_rank() is almost the same as that of rank(), but with a little difference as explained below:

select country, shop, beer, price, annual_sale,
       dense_rank() over (partition by country order by price) rn
from beer_sales
order by country, rn;

Result:

COUNTRY |SHOP   |BEER       |     PRICE |ANNUAL_SALE |        RN|
--------|-------|-----------|-----------|------------|----------|
ITALY   |AUCHAN |ALHAMBRA   |       .85 |     300000 |         1|
ITALY   |AUCHAN |ESTRELLA   |        .9 |     450000 |         2|
ITALY   |LIDL   |ALHAMBRA   |        .9 |     200000 |         2|
ITALY   |LIDL   |ESTRELLA   |        .9 |     440000 |         2|
ITALY   |LIDL   |CRUZ_CAMPO |      1.19 |     800000 |         3|
ITALY   |AUCHAN |CRUZ_CAMPO |       1.2 |     650000 |         4|
ITALY   |AUCHAN |SAN_MIGUEL |      1.25 |    1050000 |         5|
ITALY   |LIDL   |SAN_MIGUEL |      1.29 |    1150000 |         6|
SPAIN   |AUCHAN |ALHAMBRA   |       .89 |     800000 |         1|
SPAIN   |LIDL   |ALHAMBRA   |       .93 |     850000 |         2|
SPAIN   |LIDL   |ESTRELLA   |       .98 |    1250000 |         3|
SPAIN   |AUCHAN |ESTRELLA   |       .98 |    1300000 |         3|
SPAIN   |LIDL   |CRUZ_CAMPO |      1.09 |     930000 |         4|
SPAIN   |AUCHAN |CRUZ_CAMPO |       1.1 |     950000 |         5|
SPAIN   |AUCHAN |SAN_MIGUEL |      1.25 |    1950000 |         6|
SPAIN   |LIDL   |SAN_MIGUEL |       1.3 |    1500000 |         7|

As we can see, both rank() and dense_rank() assign the same rank value to ties, but dense_rank() doesn’t skip the next value, so rank numbers are consecutive.
If we need totally consecutive values, not allowing the same rank value even for ties, then row_number() is our choice.

Using row_number()

select country, shop, beer, price, annual_sale,
       row_number() over (partition by country order by price) rn
from beer_sales
order by country, rn;

Result:

COUNTRY |SHOP   |BEER       |     PRICE |ANNUAL_SALE |        RN|
--------|-------|-----------|-----------|------------|----------|
ITALY   |AUCHAN |ALHAMBRA   |       .85 |     300000 |         1|
ITALY   |AUCHAN |ESTRELLA   |        .9 |     450000 |         2|
ITALY   |LIDL   |ALHAMBRA   |        .9 |     200000 |         3|
ITALY   |LIDL   |ESTRELLA   |        .9 |     440000 |         4|
ITALY   |LIDL   |CRUZ_CAMPO |      1.19 |     800000 |         5|
ITALY   |AUCHAN |CRUZ_CAMPO |       1.2 |     650000 |         6|
ITALY   |AUCHAN |SAN_MIGUEL |      1.25 |    1050000 |         7|
ITALY   |LIDL   |SAN_MIGUEL |      1.29 |    1150000 |         8|
SPAIN   |AUCHAN |ALHAMBRA   |       .89 |     800000 |         1|
SPAIN   |LIDL   |ALHAMBRA   |       .93 |     850000 |         2|
SPAIN   |LIDL   |ESTRELLA   |       .98 |    1250000 |         3|
SPAIN   |AUCHAN |ESTRELLA   |       .98 |    1300000 |         4|
SPAIN   |LIDL   |CRUZ_CAMPO |      1.09 |     930000 |         5|
SPAIN   |AUCHAN |CRUZ_CAMPO |       1.1 |     950000 |         6|
SPAIN   |AUCHAN |SAN_MIGUEL |      1.25 |    1950000 |         7|
SPAIN   |LIDL   |SAN_MIGUEL |       1.3 |    1500000 |         8|

I have not shown examples with AVG and some others, but it would work just like the others. Also much used analytic functions are first_value() and last_value().

The KEEP clause

At last but no least, I would like to show the usage of analytic functions with the KEEP keyword. Considering our beer sales example, imagine that we want to know the minimum and maximum annual sale number of the cheapest beer(s). This assumes that at least two beers have the same cheapest price. For this reason we should change the original beer_sales table, but for this example let’s use the with clause without having to re-create or update our table.

with beer_sales (country, shop, beer, price, annual_sale) as (
  select 'SPAIN', 'AUCHAN', 'ALHAMBRA',   0.89,  800000 from dual union all
  select 'SPAIN', 'AUCHAN', 'ESTRELLA',   0.98, 1300000 from dual union all
  select 'SPAIN', 'AUCHAN', 'CRUZ_CAMPO', 1.10,  950000 from dual union all
  select 'SPAIN', 'AUCHAN', 'SAN_MIGUEL', 1.25, 1950000 from dual union all
  select 'SPAIN', 'LIDL',   'ALHAMBRA',   0.89,  850000 from dual union all
  select 'SPAIN', 'LIDL',   'ESTRELLA',   0.98, 1250000 from dual union all
  select 'SPAIN', 'LIDL',   'CRUZ_CAMPO', 1.09,  930000 from dual union all
  select 'SPAIN', 'LIDL',   'SAN_MIGUEL', 1.30, 1500000 from dual union all
  select 'ITALY', 'AUCHAN', 'ALHAMBRA',   0.85,  300000 from dual union all
  select 'ITALY', 'AUCHAN', 'ESTRELLA',   0.90,  450000 from dual union all
  select 'ITALY', 'AUCHAN', 'CRUZ_CAMPO', 1.20,  650000 from dual union all
  select 'ITALY', 'AUCHAN', 'SAN_MIGUEL', 1.25, 1050000 from dual union all
  select 'ITALY', 'LIDL',   'ALHAMBRA',   0.85,  200000 from dual union all
  select 'ITALY', 'LIDL',   'ESTRELLA',   0.90,  440000 from dual union all
  select 'ITALY', 'LIDL',   'CRUZ_CAMPO', 1.19,  800000 from dual union all
  select 'ITALY', 'LIDL',   'SAN_MIGUEL', 1.29, 1150000 from dual)
--
select country,
       min(annual_sale) keep (dense_rank first order by price) min_sale,
       max(annual_sale) keep (dense_rank first order by price) max_sale
from beer_sales
group by country;

Result:

COUNTRY |MIN_SALE |  MAX_SALE|
--------|---------|----------|
ITALY   |  200000 |    300000|
SPAIN   |  800000 |    850000|

We can see the minimum and maximum annual sale of the beer brands sold at the cheapest price, grouped by the country. The keep (dense_rank first order by price) clause indicates that we will aggregate over only the records with the lowest dense rank value. Meaning that the MIN and MAX functions will only get these rows to perform the aggregation over. We can also use last instead of first, in that case we would query the annual sale data for the highest price.

Writing the query as below would return the same result value as above, but it would display the values for all selected records, not only one record per country.

with beer_sales (country, shop, beer, price, annual_sale) as (
select 'SPAIN', 'AUCHAN', 'ALHAMBRA',   0.89,  800000 from dual union all
...
select 'ITALY', 'LIDL',   'SAN_MIGUEL', 1.29, 1150000 from dual)
--
select country, shop, beer, price, annual_sale, 
       min(annual_sale) keep (dense_rank first order by price) 
                        over (partition by country) min_sale,
       max(annual_sale) keep (dense_rank first order by price) 
                        over (partition by country) max_sale
from beer_sales;

Closing thought

If you haven’t heard of the analytic functions yet, I hope I managed to prove how important and useful they are in Oracle …

Comments