Poor performance in Oracle when executed by application

enter image description here

Having returned to work after some well-spent holidays, the New Year started with a quite interesting issue for me. A vast index calculation process that commonly takes 5 to 20 minutes to run started to behave strangely executing in 7 to 30 hours(!). This, of course, was unacceptable, so the issue quickly took over as the highest priority. The process, written in PLSQL, runs in an Oracle database. The main part of this PLSQL procedure is a vast query that calculates and inserts the indexes.

What Happened?

In the last quarter of last year I was working on an index calculation process having to do with national price indexes. I managed to achieve a performance that was better than initially expected (5 to 20 minutes depending on the amount of data). After successful tests, the process was to be added to our inner application so that the appropriate team can launch it through a user-friendly UI. Then came a surprising turn; the process would take ages to run.

Road To Solution

First I did not believe it.

First, I thought it was not possible that the performance had changed so dramatically. Maybe the operation team had done something wrong or there was an issue with the server. Therefore, I ran the process myself to check the time. As I do not have the application, I tested the process from SQL Developer. (This information will be crucial later.) To my great satisfaction, it took only a few minutes, so I told my team lead about it and he tested it from the application as well. My assumption was correct; the execution took only a few minutes. If only it were the end …

It happened again.

I was shocked to hear that the performance issue had just happened again. It is important to mention that my index calculation process is part of a huge weekly aggregation process that creates a new schema for each week. Thus, the operation team can work in a separate schema having only the data related to that period. Therefore, after the performance issue had happened again in a new schema, I tested the process again - from SQL Developer. As in the first case, it took only a few minutes for me. Then my team lead ran it through the application and the process finished in a few minutes.

As it had been the second time already, I had to come up with an explanation instead of just closing the ticket again. There was a tendency that I discovered:

A new schema is created
The process is launched through the application
Performance issue happens
I launch the process from SQL Developer
No performance issue
The process is launched again from the application
No performance issue

This could have gone on and on like an infinite loop, however, I did not leave it just there.

Investigation

First, I checked if the Oracle optimizer version is the same in both cases. It was, why would it be different? After giving it a lot of thinking, the most rational idea I had was to check whether the explain plans differ. I monitored the process in the Sessions window of SQL Developer and exported the explain plan of each case. Having run some tests in different scenarios, I discovered that the explain plan is much worse when the procedure is called from the application compared against SQL Developer. However, after executing the procedure from SQL Developer, the process ran in a normal speed through the application. After monitoring this case, I noticed that even though I was running the process through the application, the explain plan was good. This is because the good explain plan has already been generated, so when the application calls the procedure, the good plan is used.

The Root Cause

The conclusion of my investigation was that we get a different explain plan depending on the executor; application or SQL Developer. However, when the “good” explain plan has already been generated, we can call the procedure from any tool, the “correct” explain plan will be used, meaning the performance of the query will be good.

The Solution

Knowing the root cause, I consulted with the DBA team who suggested that we can force the good explain plan. This would have definitely fixed the issue; however, I was so excited to find another solution that I kept thinking. The main reason of the “bad” explain plan was the quite badly estimated number of rows in a few tables. It occurred to me that maybe I could try to gather the stats before running the query. I had no idea if it would work, but I gave it a shot. Luckily it worked! After gathering the stats the optimizer could estimate properly and create the good explain plan. Irrespective of the executor, may it be the application, SQL Developer or anything else.
Probably this will remain forever carved into my mind, so the next time it will be less “fun”.

You can read more about Oracle stats here and here.
You can gather stats in different levels (schema, table, index, etc).

An example to calculate the stats for a table:

DBMS_STATS.GATHER_TABLE_STATS(
        ownname => 'SCHEMA',
        tabname => 'TABLE'
        --partname => 'PARTITION'
        --you can even specify the partition if you wish so
    );

Explanation

I was really happy to have found the root cause and solution, however, there was still one question chasing me: why?
My colleague in the DBA team told me that the behaviour of the Oracle optimizer can change sometimes based on the tool we use to connect to the database. There had been issues that only happened through ODBC connection, but not in sqlplus for example.
I tried to find a more satisfying answer, I could not find a better one. At least not yet.

Comments

Unknown4 May 2020 at 07:24
Looks like you might have a different session setting for one of the query optimizer settings which influences how Oracle goes about hard parsing and prepping a new query plan (on a query that has cached out of buffer). See what is set for SHOW PARAMETER OPTIMIZER_MODE, for example, between your SQL developer session and the session in which your application executes.
Btw, got to your article from reading your Oracle to PostreSQL comparison. I recently went through a migration of ORCL -> PostreSQL on fairly large application, it was fun and worked out well!

RolkoTech

Search This Blog