On Thu, Apr 2, 2009 at 4:20 PM, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote: > > On 4/2/09 10:58 AM, "Merlin Moncure" <mmoncure@xxxxxxxxx> wrote: > >> On Wed, Mar 25, 2009 at 12:16 PM, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote: >>> On 3/25/09 1:07 AM, "Greg Smith" <gsmith@xxxxxxxxxxxxx> wrote: >>>> On Wed, 25 Mar 2009, Mark Kirkwood wrote: >>>>> I'm thinking that the raid chunksize may well be the issue. >>>> >>>> Why? I'm not saying you're wrong, I just don't see why that parameter >>>> jumped out as a likely cause here. >>>> >>> >>> If postgres is random reading or writing at 8k block size, and the raid >>> array is set with 4k block size, then every 8k random i/o will create TWO >>> disk seeks since it gets split to two disks. Effectively, iops will be cut >>> in half. >> >> I disagree. The 4k raid chunks are likely to be grouped together on >> disk and read sequentially. This will only give two seeks in special >> cases. > > By definition, adjacent raid blocks in a stripe are on different disks. > > >> Now, if the PostgreSQL block size is _smaller_ than the raid >> chunk size, random writes can get expensive (especially for raid 5) >> because the raid chunk has to be fully read in and written back out. >> But this is mainly a theoretical problem I think. > > This is false and a RAID-5 myth. New parity can be constructed from the old > parity + the change in data. Only 2 blocks have to be accessed, not the > whole stripe. > > Plus, this was about RAID 10 or 0 where parity does not apply. > >> >> I'm going to go out on a limb and say that for block sizes that are >> within one or two 'powers of two' of each other, it doesn't matter a >> whole lot. SSDs might be different, because of the 'erase' block >> which might be 128k, but I bet this is dealt with in such a fashion >> that you wouldn't really notice it when dealing with different block >> sizes in pg. > > Well, raid block size can be significantly larger than postgres or file > system block size and the performance of random reads / writes won't get > worse with larger block sizes. This holds only for RAID 0 (or 10), parity > is the ONLY thing that makes larger block sizes bad since there is a > read-modify-write type operation on something the size of one block. > > Raid block sizes smaller than the postgres block is always bad and > multiplies random i/o. > > Read a 8k postgres block in a 8MB md raid 0 block, and you read 8k from one > disk. > Read a 8k postgres block on a md raid 0 with 4k blocks, and you read 4k from > two disks. yep...that's good analysis...thinko on my part. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance