Sunday, January 23, 2005
The table is defined with a rather short message column (80 characters) and a reject limit of 1000 records. As alert log lines can easily be longer than 80 characters this might be a poor choice. Fortunately in the version of Oracle that Don is apparently using 80 character lines can contain the string "ORA-00600: internal error code, arguments: [18095], [0xC0000000210D8BF8], [], [], [], [], []" which is itself rather longer than 80 characters.
Whilst this approach can, as Jared points out, be valuable it would pay to think about datatypes and reject policy if one is going to employ it.
Publishing results that are impossible and rather look as if they have just been made up is, to say the least, unhelpful as well.
Wednesday, January 19, 2005
The January 18th security alert from Oracle came out as advertised yesterday. You can find the alert here. It looks as if Oracle Corp learned well and learned fast from the brouhaha surrounding the Alert 68 patch late last year. We now have a security alert and patch that
One day all security patches will be done this way.
Sunday, January 16, 2005
The large elasped time that I was experiencing for LOG_BUFFER waits are of course caused by LGWR being slower on linux than it was on Windows. lgwr uses directIO if it can (DBWR by the way uses asynch IO.) The default installation of Oracle does not enable either of these. The solution
1. Make sure that you actually have the asynchronous io libraries loaded. on red-hat and fedora rpm -qa|grep libaio will tell you, you should have both libaio and libaio-devel install these as root if need be
2. as the owner of the oracle software relink Oracle to enable asynch io
3. set filesystemio_options init.ora parameter to setall for both to be used.
restart Oracle.
Having followed these steps the good news was that my elapsed time for Linux was now 84s, this is still about 15% slower than windows for this test, but certainly in the right area.
more tests to follow in the coming weeks.
Tuesday, January 11, 2005
I started with the Windows installation, where the top 3 timed events were as follows
* db file scattered read 62s
* CPU 44s
* Unaccounted for 38s
Now as it turns out the default install of oracle on windows allocated far less memory to buffer cache than the Linux install (24mb instead of 100mb on my machine). So starting with the fact that over a minute of my 3 minutes elapsed time was spent on disk access I increased the memory allocation for the various caches. In particular I increased the buffer cache size to 100mb, I also adjusted the other pools to have identical values to those in the Linux installation.
I ignored the CPU component in this case, as I didn't see any great latching or other CPU activity that is readily tunable and moved on to the Unaccounted for time figure. The machine that I am using for these tests is in fact a windows workstation, and so as I alluded to previously there is no real surprise that Oracle is suffering from other processes pre-empting it. I made the following changes (some temporary for the perfectly good reason that I do still want to use this machine for the purpose for which it is intended ).
* disable windows firewall, anti-virus and systems management services.
* change cpu prioritisation for background services and not foreground apps.
At this point the elapsed time came down from 271s to 139s The top 4 timed events now accounted for almost all the elapsed time and read as follows
* db file scattered read 42s
* CPU 37s
* log file switch completion 25s
* log buffer space 23s
I then moved on to the redo log writing system. There are two bottlenecks here that need to be addressed. The large waits for log file switch completion can be traced back to inadequate redo log sizes (the general purpose template gives 3 groups of 10mb logs which is woeful). I increased the redo log size to 100m. The wait for log buffer space is also due to a low default parameter in this case 256k for LOG_BUFFER which was increased to 1m.
At this point the response time had come down to 74s (or just 27% of the original run time). The full profile now looks like this.
Event waited on Count Max Elapsed Average
CPU 30.67
db file scattered read 564 0.52 26.12 0.05
log buffer space 278 1.02 8.25 0.03
log file switch completion 14 1.01 5.21 0.37
Unaccounted For 2.06
rdbms ipc reply 109 0.29 0.47 0.00
db file parallel read 1 0.29 0.29 0.29
db file sequential read 20 0.05 0.26 0.01
SQL*Net message from client 7 0.04 0.06 0.01
log file sync 3 0.01 0.01 0.00
SQL*Net message to client 7 0.00 0.00 0.00
Response Time 73.40
Interestingly although the Linux box was 'easier' to tune, not needing various other services/daemons to be stopped and the profile makes much more sense I didn't achieve the same effect.
I started with a response time of 189s nearly all of which was down to the same two issues about LOG_BUFFER size and redo log sizing/number making the same adjustments to the linux installation also resulted in a fairly significant improvement in response time, but left the Linux installation slower than the Windows box.
Event waited on Times Max. Wait Elapsed Average
log buffer space 414 0.64 66.80 0.16
CPU 26.74
log file switch
(checkpoint incomplete) 18 1.00 16.69 0.93
rdbms ipc reply 88 2.00 16.35 0.19
Unaccounted For 8.56
free buffer waits 954 0.01 6.59 0.01
log file switch completion 14 1.00 3.25 0.23
SQL*Net message from client 7 0.17 0.22 0.03
log file sync 1 0.15 0.15 0.15
db file sequential read 14 0.08 0.10 0.01
SQL*Net message to client 7 0.00 0.00 0.00
db file scattered read 14 0.00 0.00 0.00
Response Time 145.45
At this point both systems are significantly faster than the out of the box configuration, but the linux box is still waiting significantly for log_buffer space and so is nearly twice as slow as the windows box. This may well be due to OS configuration.
Thursday, January 06, 2005
Anyway I have just recently taken the plunge and installed Fedore Core 3 on a couple of machines. One immediate thing that I wanted to do was to get and publish some figures on comparative performance of Oracle on Windows and Linux. Threads like this one where figures such as '4x faster on Linux than Windows' are quoted without saying what the tests were annoy me somewhat. In fact the only paper that I am aware of on the subject is Roby Sherman's - and that is only available at archive.org now.
I'll be writing up a paper shortly on a proper investigation, but thought that the following simple test on identical hardware (the same pc) would make interesting reading.
The hardware is an intel based p4 laptop with 512mb ram. For windows the database is 10.1.0.2, for linux the database is 10.1.0.3. (i.e the databases available for download from otn.) In both cases the database is the supplied 'general purpose' database. In other words this doesn't at this stage attempt to apply any common sense at all to the installation, merely install off the shelf software.
The script run in each case generates a simple table and then traces the execution of 6 inserts into the table using the DBMS_MONITOR PL/SQL package.
create table t1
as select * from all_objects
where rownum < 20001;
exec dbms_monitor.session_trace_enable();
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
exec dbms_monitor.session_trace_disable();
On the windows installation this resulted in the following resource profile.
Event Times Max. Wait Elapsed Average
db file scattered read 1095 0.52 62.39 0.06
CPU 42.44
Unaccounted For 37.90
control file sequential read 1173 0.31 36.01 0.03
log buffer space 634 0.4 29.83 0.05
log file switch completion 104 1.02 25.20 0.24
rdbms ipc reply 109 0.42 22.05 0.20
db file sequential read 309 0.14 6.12 0.02
log file sync 99 1.02 3.83 0.04
db file parallel read 11 0.47 2.36 0.21
control file parallel write 138 0.43 1.33 0.01
enq: CF - contention 8 0.37 1.21 0.15
log file switch
(checkpoint incomplete) 2 0.26 0.38 0.19
db file single write 69 0.1 0.21 0.00
control file single write 69 0.09 0.20 0.00
SQL*Net message from client 7 0 0.01 0.00
SQL*Net message to client 7 0 0.00 0.00
ksfd: async disk IO 131 0 0.00 0.00
latch: redo writing 1 0 0.00 0.00
latch: cache buffers lru chain 1 0 0.00 0.00
Response Time 271.47
On Linux the self same test ran in 70% of the time with the following resource profile.
Event Times Max Elapsed Average
log buffer space 2284 0.22 88.20 0.04
log file switch
(checkpoint incomplete) 92 1 66.80 0.73
CPU 26.40
log file switch completion 113 1 6.34 0.06
db file sequential read 56 0.09 0.78 0.01
db file scattered read 580 0.1 0.61 0.00
Unaccounted For 0.08
db file parallel read 1 0.04 0.04 0.04
SQL*Net message to client 7 0 0.00 0.00
SQL*Net message from client 7 0 0.00 0.00
latch: cache buffers lru chain 1 0 0.00 0.00
Response Time 189.25
One of the interesting things is to note the large value of unaccounted for time on the windows box - this is almost certainly down to process pre-emption caused by unnecessary services running. I expect to put results of tuning this exercise up later next week.