Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 28, 2019 at 12:03 AM Saurabh Nanda <saurabhnanda@xxxxxxxxx> wrote:
All this benchmarking has led me to a philosophical question, why does PG need shared_buffers in the first place?

PostgreSQL cannot let the OS get its hands on a dirty shared buffer until the WAL record "protecting" that buffer has been flushed to disk. If a dirty shared buffer got written to disk, but then a crash happened before the WAL record go flushed to disk, then the data could be corrupted when it comes back up. So shared_buffers effectively serves as cooling pond where dirty buffers wait for their WAL to be flushed naturally so they can be written without instigating a performance-reducing flush just for them.

Also, concurrent clients needs to access the same disk pages at overlapping times without corrupting each other.  Perhaps that could be implemented to have just the buffer headers in shared memory to coordinate the locking, and not having the buffers themselves in shared memory.  But that is not how it is currently implemented.
 
What's wrong with letting the OS do the caching/buffering? 

Nothing, and that is what it does.  Which is why the advice for shared_buffers is often to use a small fraction of RAM, leaving the rest for the OS to do its thing.  But PostgreSQL still needs a way to lock those pages, both against concurrent access by its own clients, and against getting flushed out of order by the OS.  There is no performant way to release the dirty pages immediately to the OS while still constraining the order in which the OS flushes them to disk. 

Finally, while reading a page from the OS cache into shared_buffers is much faster than reading it from disk, it is still much slower than finding it already located in shared_buffers.  So if your entire database fits in RAM, you will get better performance if shared_buffers is large enough for the entire thing to fit in there, as well.  This is an exception to the rule that shared_buffers should be a small fraction of RAM.
 
Isn't it optimised for this kind of stuff?

Maybe.  But you might be surprised at poorly optimized it is.  It depends on your OS and version of it, of course.  If you have a high usage_count buffer which is re-dirtied constantly, it will only get written and flushed to disk once per checkpoint if under PostgreSQL control. But I've seen pages like that get written many times per second under kernel control. Whatever optimization it tried to do, it wasn't very good at.  Also, if many contiguous pages are dirtied in a close time-frame, but not dirtied in their physical order, the kernel should be able to re-order them into long sequential writes, correct?  But empirically, it doesn't, at least back in the late 2.* series kernels when I did the experiments.  I don't know if it didn't even try, or tried but failed.  (Of course back then, PostgreSQL didn't do a good job of it either)

Cheers,

Jeff

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux