<$BlogRSDUrl$>

Tuesday, January 11, 2005

I have been pestered from time to time (thanks for the nag Tracy) for a simple example of how response time based tuning works in practice. The Linux/Windows comparison from last week gives me an opportunity to do just that, and to highlight an unexpected result.

I started with the Windows installation, where the top 3 timed events were as follows

* db file scattered read 62s
* CPU 44s
* Unaccounted for 38s

Now as it turns out the default install of oracle on windows allocated far less memory to buffer cache than the Linux install (24mb instead of 100mb on my machine). So starting with the fact that over a minute of my 3 minutes elapsed time was spent on disk access I increased the memory allocation for the various caches. In particular I increased the buffer cache size to 100mb, I also adjusted the other pools to have identical values to those in the Linux installation.

I ignored the CPU component in this case, as I didn't see any great latching or other CPU activity that is readily tunable and moved on to the Unaccounted for time figure. The machine that I am using for these tests is in fact a windows workstation, and so as I alluded to previously there is no real surprise that Oracle is suffering from other processes pre-empting it. I made the following changes (some temporary for the perfectly good reason that I do still want to use this machine for the purpose for which it is intended ).

* disable windows firewall, anti-virus and systems management services.
* change cpu prioritisation for background services and not foreground apps.

At this point the elapsed time came down from 271s to 139s The top 4 timed events now accounted for almost all the elapsed time and read as follows

* db file scattered read 42s
* CPU 37s
* log file switch completion 25s
* log buffer space 23s

I then moved on to the redo log writing system. There are two bottlenecks here that need to be addressed. The large waits for log file switch completion can be traced back to inadequate redo log sizes (the general purpose template gives 3 groups of 10mb logs which is woeful). I increased the redo log size to 100m. The wait for log buffer space is also due to a low default parameter in this case 256k for LOG_BUFFER which was increased to 1m.

At this point the response time had come down to 74s (or just 27% of the original run time). The full profile now looks like this.


Event waited on Count Max Elapsed Average
CPU 30.67
db file scattered read 564 0.52 26.12 0.05
log buffer space 278 1.02 8.25 0.03
log file switch completion 14 1.01 5.21 0.37
Unaccounted For 2.06
rdbms ipc reply 109 0.29 0.47 0.00
db file parallel read 1 0.29 0.29 0.29
db file sequential read 20 0.05 0.26 0.01
SQL*Net message from client 7 0.04 0.06 0.01
log file sync 3 0.01 0.01 0.00
SQL*Net message to client 7 0.00 0.00 0.00

Response Time 73.40


Interestingly although the Linux box was 'easier' to tune, not needing various other services/daemons to be stopped and the profile makes much more sense I didn't achieve the same effect.

I started with a response time of 189s nearly all of which was down to the same two issues about LOG_BUFFER size and redo log sizing/number making the same adjustments to the linux installation also resulted in a fairly significant improvement in response time, but left the Linux installation slower than the Windows box.


Event waited on Times Max. Wait Elapsed Average
log buffer space 414 0.64 66.80 0.16
CPU 26.74
log file switch
(checkpoint incomplete) 18 1.00 16.69 0.93
rdbms ipc reply 88 2.00 16.35 0.19
Unaccounted For 8.56
free buffer waits 954 0.01 6.59 0.01
log file switch completion 14 1.00 3.25 0.23
SQL*Net message from client 7 0.17 0.22 0.03
log file sync 1 0.15 0.15 0.15
db file sequential read 14 0.08 0.10 0.01
SQL*Net message to client 7 0.00 0.00 0.00
db file scattered read 14 0.00 0.00 0.00

Response Time 145.45


At this point both systems are significantly faster than the out of the box configuration, but the linux box is still waiting significantly for log_buffer space and so is nearly twice as slow as the windows box. This may well be due to OS configuration.

0 Comments
0 Comments: Post a Comment