Re: Hardware/OS recommendations for large databases (

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Luke Lonergan wrote:
OK - slower this time:
We've seen between 110MB/s and 120MB/s on a wide variety of fast CPU
machines with fast I/O subsystems that can sustain 250MB/s+ using dd, but
which all are capped at 120MB/s when doing sequential scans with different
versions of Postgres.
Postgresql issues the exact same sequence of read() calls as does dd. So why is dd so much faster?

I'd be careful with the dd read of a 16GB file on an 8GB system. Make sure you umount the file system first, to make sure all of the file is flushed from memory. Some systems use a freebehind on sequential reads to avoid flushing memory...and you'd find that 1/2 of your 16GB file is still in memory. The same point also holds for the writes: when dd finishes not all the data is on disk. You need to issue a sync() call to make that happen. Use lmdd to ensure that the data is actually all written. In other words, I think your dd results are possibly misleading.

It's trivial to demonstrate:

$ time dd if=/dev/zero of=/fidb1/bigfile bs=8k count=800000
800000+0 records in
800000+0 records out

real    0m13.780s
user    0m0.134s
sys     0m13.510s

Oops. I just wrote 470MB/s to a file system that has peak write speed of 200MB/s peak.

Now, you might say that you wrote a 16GB file on an 8 GB machine so this isn't an issue. It does make your dd numbers look fast as some of the data will be unwritten.


I'd also suggest running dd on the same files as postgresql. I suspect you'd find that the layout of the postgresql files isn't that good as they are grown bit by bit, unlike the file created by simply dd'ing a large file.

Understand my point: It doesn't matter that there is idle or iowait on the
CPU, the postgres executor is not able to drive the I/O rate for two
reasons: there is a lot of CPU used for the scan (the 40% you reported) and
a lack of asynchrony (the iowait time).  That means that by speeding up the
CPU you only reduce the first part, but you don't fix the second and v.v.

With more aggressive readahead, the second problem (the I/O asynchrony) is
handled better by the Linux kernel and filesystem.  That's what we're seeing
with XFS.

I think your point doesn't hold up. Every time you make it, I come away posting another result showing it to be incorrect.

The point your making doesn't match my experience with *any* storage or program I've ever used, including postgresql. Your point suggests that the storage system is idle and that postgresql is broken because it isn't able to use the resources available...even when the cpu is very idle. How can that make sense? The issue here is that the storage system is very active doing reads on the files...which might be somewhat poorly allocated on disk because postgresql grows the tables bit by bit.

I had the same readahead in Reiser and in XFS. The XFS performance was better because XFS does a better job of large file allocation on disk, thus resulting in many fewer seeks (generated by the file system itself) to read the files back in. As an example, some file systems like UFS purposely scatter large files across cylinder groups to avoid forcing large seeks on small files; one can tune this behavior so that large files are more tightly allocated.



Of course, because this is engineering, I have another obligatory data point: This time it's a 4.2GB table using 137,138 32KB pages with nearly 41 million rows.

A "select count(1)" on the table completes in 14.6 seconds, for an average read rate of 320 MB/s. One cpu was idle, the other averaged 32% system time and 68 user time for the 14 second period. This is on a 2.2Ghz Opteron. A faster cpu would show increased performance as I really am cpu bound finally. Postgresql is clearly able to issue the relevant sequential read() system calls and sink the resulting data without a problem if the file system is capable of providing the data. It can do this up to a speed of ~300MB/s on this class of system. Now it should be fairly simple to tweak the few spots where some excess memory copies are being done and up this result substantially. I hope postgresql is always using the libc memcpy as that's going to be a lot faster then some private routine.

-- Alan



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux