Search Postgresql Archives

Re: Why does splitting $PGDATA and xlog yield a performance benefit?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Aug 25, 2015, at 10:45 AM, Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> wrote:
> 
> On Tue, 25 Aug 2015 10:08:48 -0700
> David Kerr <dmk@xxxxxxxxxxxxxx> wrote:
> 
>> Howdy All,
>> 
>> For a very long time I've held the belief that splitting PGDATA and xlog on linux systems fairly universally gives a decent performance benefit for many common workloads.
>> (i've seen up to 20% personally).
>> 
>> I was under the impression that this had to do with regular fsync()'s from the WAL 
>> interfearing with and over-reaching writing out the filesystem buffers. 
>> 
>> Basically, I think i was conflating fsync() with sync(). 
>> 
>> So if it's not that, then that just leaves bandwith (ignoring all of the other best practice reasons for reliablity, etc.). So, in theory if you're not swamping your disk I/O then you won't really benefit from relocating your XLOGs.
> 
> Disk performance can be a bit more complicated than just "swamping." Even if

Funny, on revision of my question, I left out basically that exact line for simplicity sake. =)

> you're not maxing out the IO bandwidth, you could be getting enough that some
> writes are waiting on other writes before they can be processed. Consider the
> fact that old-style ethernet was only able to hit ~80% of its theoretical
> capacity in the real world, because the chance of collisions increased with
> the amount of data, and each collision slowed down the overall transfer speed.
> Contrasted with modern ethernet that doesn't do collisions, you can get much
> closer to 100% of the rated bandwith because the communications are effectively
> partitioned from each other.
> 
> In the worst case scenerion, if two processes (due to horrible luck) _always_
> try to write at the same time, the overall responsiveness will be lousy, even
> if the bandwidth usage is only a small percent of the available. Of course,
> that worst case doesn't happen in actual practice, but as the usage goes up,
> the chance of hitting that interference increases, and the effective response
> goes down, even when there's bandwidth still available.
> 
> Separate the competing processes, and the chance of conflict is 0. So your
> responsiveness is pretty much at best-case all the time.

Understood. Now in my previous delve into this issue, I showed minimal/no disk queuing, the SAN showed nothing on it's queues and no retries. (of course #NeverTrustTheSANGuy) but I still yielded a 20% performance increase by splitting the WAL and $PGDATA

But that's besides the point and my data on that environment is long gone.

I'm content to leave this at "I/O is complicated" I just wanted to make sure that i wasn't correct but for a slightly wrong reason.

Thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux