Tuesday, April 19, 2005
The tables involved looked like this. I have renamed them to illustrate function better (and to hide the product). The trans table holds a register of transactions, the docs table is a table that holds documents about the transactions, the tasks table holds a list of current and completed workflow tasks, and the workflow table itself maps the current state of the document within the workflow. The script task_demo that I have uploaded allows you to create this schema extract with representative sample data in it yourself.
SQL> DESC DOCS
Name Null? Type
----------------------------------------- -------- --------------
DOCID NOT NULL NUMBER
DOC_TRANSNO NUMBER
DOC_LINE# NUMBER
DOC_EXTREF VARCHAR2(255)
SQL> DESC TRANS
Name Null? Type
----------------------------------------- -------- --------------
TRANSID NOT NULL NUMBER
TRANSNO NOT NULL VARCHAR2(12)
LINE# NOT NULL NUMBER
COMPANY NOT NULL VARCHAR2(2)
TREATMENT_CODE NOT NULL NUMBER
TRANSDATE DATE
SQL> DESC TASKS
Name Null? Type
----------------------------------------- -------- ---------------
TK_ID NOT NULL NUMBER
TK_NAME VARCHAR2(30)
TK_STATUS NUMBER
TK_ACTIVITY VARCHAR2(20)
TK_WFID NUMBER
TK_USERID NUMBER
SQL> DESC WORKFLOW
Name Null? Type
----------------------------------------- -------- -----------
WF_ID NOT NULL NUMBER
WF_DOCID NOT NULL NUMBER
WF_STATUS NOT NULL NUMBER
The problem that we were having was that when users of the system who took part in the workflow logged on the the web interface for this system were experiencing a significant delay before the page rendered. Tracking this down in third party code in a connection pooled environment was a challenge on its own, but after a while we determined that the delay was attributable to this statement. sql2.sql holds this code
SELECT
count(*)
FROM
docs d,
tasks tk,
workflow wf,
trans t
WHERE
TK.TK_status IN(2,16)
AND TK.TK_activity = 'APPROVE'
AND TK.tk_wfid = wf.wf_id
AND wf.wf_status IN(2,4)
AND wf.wf_docid = d.docid
AND TO_CHAR(t.transno) = d.doc_transno
AND t.company = 'C1'
AND t.line# = d.doc_line#
AND t.treatment_code = '7'
AND d.doc_extref = 'C1;7;'||d.doc_transno||';'||TO_CHAR(d.doc_line#)
/
Unfortunately on our test system this statement ran fine - even against the same data. After some investigation we determined that in fact the optimizer_index parameters were different, they were still set to their default values on the test system, but to realistic values on the live system. I ran the silver_bullet script with various values of these parameters and obtained the following results. As you can see in the test I have only the o_i_c_a parameter was important, setting this very low flipped the execution plan to a significantly worse one. In the real life situation there was in fact an interaction between the two parameters, but the same general point held true, setting the parameters to commonly accepted values can cause otherwise perfectly performing sql to behave poorly.
Incidentally you might be wondering why the script is called sql2, well there is also a sql1 which illustrates the plan switching towards nested loops from a hash join (this is the classic example found in this presentation by Jonathan Lewis for example. The problem was that although the plans did switch, the difference between the runtime didn't. I did some further research and can find no timed examples showing an improvement from setting these parameters, just examples showing that the same statement can change execution plans, the fact that two different plans can be used is not evidence of which setting is better.