Niall's Oracle Pages

Sunday, January 23, 2005

Oh dear. Whilst browsing Don Burleson's site http://www.dba-oracle.com I came across this article on how to read the alert log from sql. Apart from being a short soundbite on an already well publicised use of external tables (see for example Jared Still's excellent article for dbazine on the same subject) the example published has some interesting features.

The table is defined with a rather short message column (80 characters) and a reject limit of 1000 records. As alert log lines can easily be longer than 80 characters this might be a poor choice. Fortunately in the version of Oracle that Don is apparently using 80 character lines can contain the string "ORA-00600: internal error code, arguments: [18095], [0xC0000000210D8BF8], [], [], [], [], []" which is itself rather longer than 80 characters.

Whilst this approach can, as Jared points out, be valuable it would pay to think about datatypes and reject policy if one is going to employ it.

Publishing results that are impossible and rather look as if they have just been made up is, to say the least, unhelpful as well.

Permalink posted by Niall @ 6:26 PM

1 Comments

Wednesday, January 19, 2005

How to do security alerts.

The January 18th security alert from Oracle came out as advertised yesterday. You can find the alert here. It looks as if Oracle Corp learned well and learned fast from the brouhaha surrounding the Alert 68 patch late last year. We now have a security alert and patch that

Was advertised and released on schedule.

Details what products at what patch levels are affected by which vulnerabilities.

Even fixes unsupported releases.

One day all security patches will be done this way.

Permalink posted by Niall @ 10:26 AM

0 Comments

Sunday, January 16, 2005

Thanks to Mladen Gogola for pointing out that it was indeed a configuration issue, though as it turns out an Oracle one and not an OS level issue.

The large elasped time that I was experiencing for LOG_BUFFER waits are of course caused by LGWR being slower on linux than it was on Windows. lgwr uses directIO if it can (DBWR by the way uses asynch IO.) The default installation of Oracle does not enable either of these. The solution

1. Make sure that you actually have the asynchronous io libraries loaded. on red-hat and fedora rpm -qa|grep libaio will tell you, you should have both libaio and libaio-devel install these as root if need be

2. as the owner of the oracle software relink Oracle to enable asynch io

make -f ins_rdbms.mk asynch_on

make -f ins_rdbms.mk ioracle

3. set filesystemio_options init.ora parameter to setall for both to be used.

restart Oracle.

Having followed these steps the good news was that my elapsed time for Linux was now 84s, this is still about 15% slower than windows for this test, but certainly in the right area.

more tests to follow in the coming weeks.

Permalink posted by Niall @ 1:37 PM

0 Comments

Tuesday, January 11, 2005

I have been pestered from time to time (thanks for the nag Tracy) for a simple example of how response time based tuning works in practice. The Linux/Windows comparison from last week gives me an opportunity to do just that, and to highlight an unexpected result.

I started with the Windows installation, where the top 3 timed events were as follows


* db file scattered read     62s

* CPU                        44s

* Unaccounted for            38s

Now as it turns out the default install of oracle on windows allocated far less memory to buffer cache than the Linux install (24mb instead of 100mb on my machine). So starting with the fact that over a minute of my 3 minutes elapsed time was spent on disk access I increased the memory allocation for the various caches. In particular I increased the buffer cache size to 100mb, I also adjusted the other pools to have identical values to those in the Linux installation.

I ignored the CPU component in this case, as I didn't see any great latching or other CPU activity that is readily tunable and moved on to the Unaccounted for time figure. The machine that I am using for these tests is in fact a windows workstation, and so as I alluded to previously there is no real surprise that Oracle is suffering from other processes pre-empting it. I made the following changes (some temporary for the perfectly good reason that I do still want to use this machine for the purpose for which it is intended ).


* disable windows firewall, anti-virus and systems management services.

* change cpu prioritisation for background services and not foreground apps.

At this point the elapsed time came down from 271s to 139s The top 4 timed events now accounted for almost all the elapsed time and read as follows


* db file scattered read	42s

* CPU			37s	

* log file switch completion	25s

* log buffer space		23s

I then moved on to the redo log writing system. There are two bottlenecks here that need to be addressed. The large waits for log file switch completion can be traced back to inadequate redo log sizes (the general purpose template gives 3 groups of 10mb logs which is woeful). I increased the redo log size to 100m. The wait for log buffer space is also due to a low default parameter in this case 256k for LOG_BUFFER which was increased to 1m.

At this point the response time had come down to 74s (or just 27% of the original run time). The full profile now looks like this.


Event waited on	               Count	Max	Elapsed	Average	   

CPU			                          30.67		   

db file scattered read	         564	0.52	  26.12	   0.05	   

log buffer space	         278	1.02	   8.25	   0.03	   

log file switch completion	  14	1.01	   5.21	   0.37	   

Unaccounted For			                   2.06		   

rdbms ipc reply	                 109	0.29	   0.47	   0.00	   

db file parallel read	           1	0.29	   0.29	   0.29	   

db file sequential read	          20	0.05	   0.26	   0.01	   

SQL*Net message from client	   7	0.04	   0.06	   0.01	   

log file sync	                   3	0.01	   0.01	   0.00	   

SQL*Net message to client	   7	0.00	   0.00	   0.00	   

					   

Response Time			                  73.40

Interestingly although the Linux box was 'easier' to tune, not needing various other services/daemons to be stopped and the profile makes much more sense I didn't achieve the same effect.

I started with a response time of 189s nearly all of which was down to the same two issues about LOG_BUFFER size and redo log sizing/number making the same adjustments to the linux installation also resulted in a fairly significant improvement in response time, but left the Linux installation slower than the Windows box.


Event waited on	                Times	Max. Wait   Elapsed	Average

log buffer space	          414	     0.64     66.80	   0.16

CPU 			                              26.74	

log file switch 

(checkpoint incomplete)	           18	     1.00     16.69	   0.93

rdbms ipc reply	                   88	     2.00     16.35	   0.19

Unaccounted For			                       8.56	

free buffer waits	          954	     0.01      6.59	   0.01

log file switch completion	   14	     1.00      3.25	   0.23

SQL*Net message from client	    7	     0.17      0.22        0.03

log file sync	                    1	     0.15      0.15	   0.15

db file sequential read	           14	     0.08      0.10        0.01

SQL*Net message to client	    7	     0.00      0.00	   0.00

db file scattered read	           14	     0.00      0.00	   0.00



Response Time			                     145.45

At this point both systems are significantly faster than the out of the box configuration, but the linux box is still waiting significantly for log_buffer space and so is nearly twice as slow as the windows box. This may well be due to OS configuration.

Permalink posted by Niall @ 9:56 AM

0 Comments

Thursday, January 06, 2005

Anyone who follows Oracle at all cannot help but have noticed the attention Oracle Corp has been paying to Linux in 2004. I can't see this changing as I believe that they rather suspect that a good chunk of the proprietory Unix market will move to cheap intel boxes running linux over the next 5 years or so. I'll admit that I believe this, largely because I consider this to be correct.

Anyway I have just recently taken the plunge and installed Fedore Core 3 on a couple of machines. One immediate thing that I wanted to do was to get and publish some figures on comparative performance of Oracle on Windows and Linux. Threads like this one where figures such as '4x faster on Linux than Windows' are quoted without saying what the tests were annoy me somewhat. In fact the only paper that I am aware of on the subject is Roby Sherman's - and that is only available at archive.org now.

I'll be writing up a paper shortly on a proper investigation, but thought that the following simple test on identical hardware (the same pc) would make interesting reading.

The hardware is an intel based p4 laptop with 512mb ram. For windows the database is 10.1.0.2, for linux the database is 10.1.0.3. (i.e the databases available for download from otn.) In both cases the database is the supplied 'general purpose' database. In other words this doesn't at this stage attempt to apply any common sense at all to the installation, merely install off the shelf software.

The script run in each case generates a simple table and then traces the execution of 6 inserts into the table using the DBMS_MONITOR PL/SQL package.



create table t1

as select * from all_objects

where rownum < 20001;



exec dbms_monitor.session_trace_enable();



insert into t1 select * from t1;

insert into t1 select * from t1;

insert into t1 select * from t1;

insert into t1 select * from t1;

insert into t1 select * from t1;

insert into t1 select * from t1;



exec dbms_monitor.session_trace_disable();

On the windows installation this resulted in the following resource profile.


Event			       Times	Max. Wait      Elapsed	     Average

db file scattered read		1095	0.52		62.39		0.06

CPU							42.44	

Unaccounted For						37.90	

control file sequential read	1173	0.31		36.01		0.03

log buffer space		634	0.4		29.83		0.05

log file switch completion	104	1.02		25.20		0.24

rdbms ipc reply			109	0.42		22.05		0.20

db file sequential read		309	0.14		6.12		0.02

log file sync			99	1.02		3.83		0.04

db file parallel read		11	0.47		2.36		0.21

control file parallel write	138	0.43		1.33		0.01

enq: CF - contention		8	0.37		1.21		0.15

log file switch 

(checkpoint incomplete)		2	0.26		0.38		0.19

db file single write		69	0.1		0.21		0.00

control file single write	69	0.09		0.20		0.00

SQL*Net message from client	7	0		0.01		0.00

SQL*Net message to client	7	0		0.00		0.00

ksfd: async disk IO		131	0		0.00		0.00

latch: redo writing		1	0		0.00		0.00

latch: cache buffers lru chain	1	0		0.00		0.00



Response Time						271.47

On Linux the self same test ran in 70% of the time with the following resource profile.


Event					Times	Max	Elapsed	Average

log buffer space			2284	0.22	88.20	0.04

log file switch 

(checkpoint incomplete)			92	1	66.80	0.73

CPU							26.40	

log file switch completion		113	1	6.34	0.06

db file sequential read			56	0.09	0.78	0.01

db file scattered read			580	0.1	0.61	0.00

Unaccounted For						0.08	

db file parallel read			1	0.04	0.04	0.04

SQL*Net message to client		7	0	0.00	0.00

SQL*Net message from client		7	0	0.00	0.00

latch: cache buffers lru chain		1	0	0.00	0.00



Response Time						189.25

One of the interesting things is to note the large value of unaccounted for time on the windows box - this is almost certainly down to process pre-emption caused by unnecessary services running. I expect to put results of tuning this exercise up later next week.

Permalink posted by Niall @ 2:51 PM

0 Comments

Niall's Oracle Pages

Sunday, January 23, 2005

Wednesday, January 19, 2005

Sunday, January 16, 2005

Tuesday, January 11, 2005

Thursday, January 06, 2005

About

Links

Scripts

Resources

Presentations

archives