Comparing query methods to display rows as columns

enter image description here

We often have a task at work when we need data from a database table in a way that some values are not displayed in rows, but being next to each other. In other words, rotating rows into columns. Depending on the particular task, I always try to use the simpliest way to do it.

Let me show three methods by means of which I have coped with this so far.

To present some examples, I created a simple table that contains the population data of three cities from 2010 to 2014.
Usually we have IDs, but for the sake of simplicity let’s ignore them now.

Please find the table creation script here:

create table city_populations (
     city       varchar2(30)
    ,year       number(4)
    ,population number(10)
);

insert into city_populations
select 'MADRID', 2010, 2900000   from dual union all
select 'MADRID', 2011, 2910000   from dual union all
select 'MADRID', 2012, 2920000   from dual union all
select 'MADRID', 2013, 2930000   from dual union all
select 'MADRID', 2014, 2940000   from dual union all

select 'BUDAPEST', 2010, 2000000   from dual union all
select 'BUDAPEST', 2011, 1980000   from dual union all
select 'BUDAPEST', 2012, 1975000   from dual union all
select 'BUDAPEST', 2013, 1900000   from dual union all
select 'BUDAPEST', 2014, 1890000   from dual union all

select 'LONDON', 2010, 6000000   from dual union all
select 'LONDON', 2011, 6100000   from dual union all
select 'LONDON', 2012, 6500000   from dual union all
select 'LONDON', 2013, 6850000   from dual union all
select 'LONDON', 2014, 7000000   from dual;

commit;
/

Let’s check what we have in the table:

select * from city_populations;

Result, only for two-two years:

|CITY            |     YEAR| POPULATION|
|----------------|---------|-----------|
|BUDAPEST        |     2010|    2000000|
|BUDAPEST        |     2011|    1980000|
|...             |         |           |
|LONDON          |     2010|    6000000|
|LONDON          |     2011|    6100000|
|...             |         |           |
|MADRID          |     2010|    2900000|
|MADRID          |     2011|    2910000|
|...             |         |           |

Now that we have the table in place, our task is to display the population data from 2010 to 2014 (or any interval we want) next to the city name. Something like the following:

|CITY       |    2010|    2011|    2012|    2013|    2014|
|-----------|--------|--------|--------|--------|--------|
|MADRID     | 2900000| 2910000| 2920000| 2930000| 2940000|

Method 1 - Pivot table

I really like to use pivot tables in Oracle, it is quite easy to produce a spectecular result.

Let’s have a look at the example below:

select * from city_populations
pivot (
    min(population) for year in 
           (2010, 2011, 2012, 2013, 2014)
)
order by city;

As you can see, we can specify the parameters in the pivot clause.

  • min(population): this is the value (column) that we want to rotate from rows into columns. This has to be an aggregate function such as min, max, sum, count or avg. If there is only one value without the need to aggregate - like in our case -, then usually I just put min().
  • for year in (2010, ..., 2014): by this we specify the column headers. We want the year values in parenthesis to be the column headers and then for each city we display the population data in one row for all years.

Result:

|CITY        |    2010|    2011|    2012|    2013|    2014|
|------------|--------|--------|--------|--------|--------|
|BUDAPEST    | 2000000| 1980000| 1975000| 1900000| 1890000|
|LONDON      | 6000000| 6100000| 6500000| 6850000| 7000000|
|MADRID      | 2900000| 2910000| 2920000| 2930000| 2940000|

I must admit that there is something I don’t like and I hope Oracle will solve in the near future: it is not really possible to use a subquery in the in clause, hence we have to provide all required values.
As far as I know, it is possible to use a subquery in a pivot xml() clause, but I have never had much success with it. If we use our query in a package or a script, we might have an easier job to define column header values by using variables that we can interpolate into a query.

If we want to display only one city, we can use the WHERE clause as usual:

select * from city_populations
pivot (
    min(population) for year in 
           (2010, 2011, 2012, 2013, 2014)
)
where city = 'MADRID'
order by city;

Result:

|CITY        |    2010|    2011|    2012|    2013|    2014|
|------------|--------|--------|--------|--------|--------|
|MADRID      | 2900000| 2910000| 2920000| 2930000| 2940000|

Method 2 - Hierarchical query

I find hierarchical queries really useful and exciting to work with. I am planning to dedicate a separate article to this in the not-too-distant future. I am not an expert on this subject either, but I will be happy to dive into the world of hierarchical queries.

What we want to achieve with this query is to select the city plus the population values throughout the first and last year. We can think of the years in between as a hierarchy.

Let’s take a look at the example below:

select city, 
       substr(sys_connect_by_path(population, ' | '), 4) "POPULATION_CHANGE"
from city_populations
where level = 5
start with year = 2010 
connect by prior year = year-1 and prior city = city
order by city;
  • start with year = 2010: we define the starting value, this will be the starting row. In a database table with employees this could be the CEO for example.
  • connect by prior year = year-1 and prior city = city: we define the hierarchy rule, in other words the relationship between the parent and child rows. In our case it is the next year. prior city = city is required to apply the hierarchy rule only within the same city.
  • where level = 5: try to run the query without this condition and you will see that the result contains all rows with all possible levels. Meaning that you have the result for only 2010 (level 1), 2010-2011 (level 2), 2010-2011-2012 (level 3), etc.
  • sys_connect_by_path(population, ' | '): this provides the population change (path) from the first to the last year separated by pipe as defined.

Result:

|CITY         |POPULATION_CHANGE                                |
|-------------|-------------------------------------------------|
|BUDAPEST     | 2000000 | 1980000 | 1975000 | 1900000 | 1890000 |
|LONDON       | 6000000 | 6100000 | 6500000 | 6850000 | 7000000 |
|MADRID       | 2900000 | 2910000 | 2920000 | 2930000 | 2940000 |

This solution is especially appropriate if we don’t need to have the population data in separate columns, but just separated by any separator that we define.

If you want to reverse the order of the population data, you just need to change the start with and connect by expressions as shown below:

select city, 
       substr(sys_connect_by_path(population, ' | '), 4) "POPULATION_CHANGE"
from city_populations
where level = 5
start with year = 2014
connect by prior year = year+1 and prior city = city
order by city;

If by any reason you need the year values as well, we can do the following trick:

select ' ' as "CITY",
       substr(sys_connect_by_path(lpad(year, 10, ' '), ' | '), 4) 
										       "POPULATION_CHANGE"
from city_populations
where level = 5 and rownum = 1
start with year = 2010 
connect by prior year = year-1 and prior city = city
union
select city as "CITY", 
       substr(sys_connect_by_path(lpad(population, 10, ' '), ' | '), 4) 
										       "POPULATION_CHANGE"
from city_populations
where level = 5
start with year = 2010 
connect by prior year = year-1 and prior city = city
order by 1;

Result:

|CITY         |POPULATION_CHANGE                                                    |
|-------------|---------------------------------------------------------------------|
|             |        2010 |        2011 |        2012 |        2013 |        2014 |
|BUDAPEST     |     2000000 |     1980000 |     1975000 |     1900000 |     1890000 |
|LONDON       |     6000000 |     6100000 |     6500000 |     6850000 |     7000000 |
|MADRID       |     2900000 |     2910000 |     2920000 |     2930000 |     2940000 |

Just to show another interesting feature with the hierarchical query, let’s check the following example:

select city, year, 
       lpad(' ', 4*(level-1))||population "POPULATION"
from city_populations
--where city = 'MADRID'
start with year = 2010 
connect by prior year = year-1 
	   and prior city = city
order by city, year;

Result (I uncommented the WHERE clause to have a shorter result):

|CITY    |YEAR  |POPULATION             |
|--------|------|-----------------------|
|MADRID  |2010  |2900000                |
|MADRID  |2011  |    2910000            |
|MADRID  |2012  |        2920000        |
|MADRID  |2013  |            2930000    |
|MADRID  |2014  |                2940000|

Method 3 - listagg function

This is a very simple way of getting - in a few cases - almost the same result as with the hierarchical queries. The listagg function basically concatenates the values in the defined group.

Let’s have a look at the example below:

select city, 
       listagg(population, ' | ') within group (order by year asc) "POPULATION" 
from city_populations 
group by city;
  • listagg(population, ' | '): we want to concatenate the population values separated by a pipe.
  • within group (order by year asc): here we define the ordering rule withing the group, in this case ascending by the year. You can just change it to desc to reverse the order.

Result:

|CITY         |POPULATION_CHANGE                                |
|-------------|-------------------------------------------------|
|BUDAPEST     | 2000000 | 1980000 | 1975000 | 1900000 | 1890000 |
|LONDON       | 6000000 | 6100000 | 6500000 | 6850000 | 7000000 |
|MADRID       | 2900000 | 2910000 | 2920000 | 2930000 | 2940000 |

To have the year values as well:

select ' ' "CITY", 
       listagg(lpad(year, 10, ' '), ' | ') 
		       within group (order by year asc) "POPULATION" 
from city_populations 
group by city
union
select city "CITY", 
       listagg(lpad(population, 10, ' '), ' | ') 
		       within group (order by year asc) "POPULATION" 
from city_populations 
group by city;

Result:

|CITY         |POPULATION_CHANGE                                                    |
|-------------|---------------------------------------------------------------------|
|             |        2010 |        2011 |        2012 |        2013 |        2014 |
|BUDAPEST     |     2000000 |     1980000 |     1975000 |     1900000 |     1890000 |
|LONDON       |     6000000 |     6100000 |     6500000 |     6850000 |     7000000 |
|MADRID       |     2900000 |     2910000 |     2920000 |     2930000 |     2940000 |

Please keep in mind that the query to fetch the years would return three rows, becuase we have three cities. The reason that we have only one row in the result is becuase union removes the duplicates. union all would keep them.

Well, after this exhaustive article it is high time to go dancing Bachata and from tomorrow on we can enjoy our life and cherry-pick from the methods mentioned above.

Comments