Re: Slow count(*) again...

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Mon, 11 Oct 2010 06:41:16 +0800

On 10/11/2010 01:14 AM, Mladen Gogala wrote:

I can provide measurements, but from Oracle RDBMS. Postgres doesn't
allow tuning of that aspect, so no measurement can be done. Would the
numbers from Oracle RDBMS be acceptable?

Well, they'd tell me a lot about Oracle's performance as I/O chunk size 
scales, but almost nothing about the cost of small I/O operations vs 
larger ones in general.

Typically dedicated test programs that simulate the database read 
patterns would be used for this sort of thing. I'd be surprised if 
nobody on -hackers has already done suitable testing; I was mostly 
asking because I was interested in how you were backing your assertions.

PostgreSQL isn't Oracle; their design is in many ways very different. 
Most importantly, Oracle uses a redo log, where PostgreSQL stores old 
rows with visibility information directly in the tables. It is possible 
that a larger proportion of Oracle's I/O costs are fixed per-block 
overheads rather than per-byte costs, so it seeks to batch requests into 
larger chunks. Of course, it's also possible that 8k chunk I/O is just 
universally expensive and is something Pg should avoid, too, but we 
can't know that without
dedicated testing, which I at least haven't done. I don't follow 
-hackers closely, and wouldn't have seen discussion about testing done 
there. The archives are likely to contain useful discussions.

Then again, IIRC Pg's page size is also it's I/O size, so you could 
actually get larger I/O chunking by rebuilding Pg with larger pages. 
Having never had the need, I haven't examined the performance of page 
size changes on I/O performance.

The Linux kernel, at least, does request merging (and splitting, and
merging, and more splitting) along the request path, and I'd
personally expect that most of the cost of 8k requests would be in the
increased number of system calls, buffer copies, etc required.
Measurements demonstrating or contradicting this would be good to see.

Even the cost of hundreds of thousands of context switches is far from
negligible. What kind of measurements do you expect me to do with the
database which doesn't support tweaking of that aspect of its operation?

Test programs, or references to testing done by others that demonstrates 
these costs in isolation. Of course, they still wouldn't show what gain 
Pg might obtain (nothing except hacking on Pg's sources really will) but 
they'd help measure the costs of doing I/O that way.

I suspect you're right that large I/O chunks would be desirable and a 
potential performance improvement. What I'd like to know is *how* 
*much*, or at least how much the current approach costs in pure 
overheads like context switches and scheduler delays.

Does that provide enough of an evidence and, if not, why not?

It shows that it helps Oracle a lot ;-)

Without isolating how much of that is raw costs of the block I/O and how 
much is costs internal to Oracle, it's still hard to have much idea how 
much it'd benefit Pg to take a similar approach.

I'm sure the folks on -hackers have been over this and know a whole lot 
more about it than I do, though.

Oracle counts 400 million records in 2 minutes and Postgres 9.01 takes
12.8 seconds to count 1.2 million records? Do you see the disparity?

Sure. What I don't know is how much of that is due to block sizes. There 
are all sorts of areas where Oracle could be gaining.

It maybe so, but slow sequential scan is still the largest single
performance problem of PostgreSQL. The frequency with which that topic
appears on the mailing lists should serve as a good evidence for that.

I'm certainly not arguing that it could use improvement; it's clearly 
hurting some users. I just don't know if I/O chunking is the answer - I 
suspect that if it were, then it would've become a priority for one or 
more people working on Pg much sooner.

It's quite likely that it's one of those things where it makes a huge 
difference for Oracle because Oracle has managed to optimize out most of 
the other bigger costs. If Pg still has other areas that make I/O more 
expensive per-byte (say, visibility checks) and low fixed per-block 
costs, then there'd be little point in chunking I/O. My understanding is 
that that's pretty much how things stand at the moment, but I'd love 
verification from someone who's done the testing.

If you still claim that it wouldn't make the difference,
the onus to prove it is on you.

I didn't mean to claim that it would make no difference. If I sounded 
like it, sorry.

I just want to know how _much_ , or more accurately how great the 
overheads of the current approach in Pg are vs doing much larger reads.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance