On Mon, Aug 30, 2010 at 04:49:58PM -0400, Bill Fink wrote: > > Thanks for reporting it. I'm going to have to take a closer look at > > why this makes a difference. I'm going to guess though that what's > > going on is that we're posting writes in such a way that they're no > > longer aligned or ending at the end of a RAID5 stripe, causing a > > read-modify-write pass. That would easily explain the write > > performance regression. > > I'm not sure I understand. How could calling or not calling > ext4_num_dirty_pages() (unpatched versus patched 2.6.35 kernel) > affect the write alignment? Suppose you have 8 disks, with stripe size of 16k. Assuming that you're only using one parity disk (i.e., RAID 5) and no spare disks, that means the optimal I/O size is 7*16k == 112k. If we do a write which is smaller than 112k, or which is not a multiple of 112k, then the RAID subsystem will need to do a read-modify-write to update the parity disk. Furthermore, the write had better be aligned on an 112k byte boundary. The block allocator will guarantee that block #0 is aligned on a 112k block, but writes have to also be right size in order to avoid the read-modify-write. If we end up doing very small writes, then it can end up being quite disatrous for write performance. > I was wondering if the locking being done in ext4_num_dirty_pages() > could somehow be affecting the performance. I did notice from top > that in the patched 2.6.35 kernel, the I/O wait time was generally > in the 60-65% range, while in the unpatched 2.6.35 kernel, it was > at a higher 75-80% range. However, I don't know if that's just a > result of the lower performance, or a possible clue to its cause. I/O wait time would tend to imply that the raid controller is taking longer to do the write updates, which would tend to confirm that we're doing more read-modify-write cycles. If we were hitting spinlock contention, this would show up as more system CPU time consumed. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html