The problem with using page-flush as a write cache here is that write to MD don't go through IO scheduler, which is a very big problem, because when flush thread decide to write to MD, it's impossible to control the write speed, or prioritize them with read, every requests basically is a fifo, and when flush size is big, no read can be served. On Tue, Dec 6, 2011 at 5:01 PM, NeilBrown <neilb@xxxxxxx> wrote: > On Tue, 6 Dec 2011 15:13:34 -0800 Yucong Sun (叶雨飞) <sunyucong@xxxxxxxxx> > wrote: > >> On Tue, Dec 6, 2011 at 2:26 PM, NeilBrown <neilb@xxxxxxx> wrote: >> > On Tue, 6 Dec 2011 14:01:14 -0800 Yucong Sun (叶雨飞) <sunyucong@xxxxxxxxx> >> > wrote: >> > >> >> Hi, >> >> >> >> I recently setup raid10 on 4 physical disk and have a iscsi serve it >> >> as a block device, and have been trying to tweak for performance. >> >> >> >> First thing I notice that MD seems to rely on page cache to flush >> >> changes to disk, is there any way to turn that off so changes are >> >> flushed to the disk? like O_FSYNC|O_DIRECT does? The reason I want to >> >> turn it off is to understand the performance difference, I want to be >> >> sure that page cache is truly acting as a write-back cache, I know one >> >> can tune the dirty_* to control the cache flush, but I want to make >> >> sure that it is actually doing what I think it does. >> > >> > Why do you think this? >> > >> > md/raid10 sends all request straight through to the relevant underlying >> > device(s). >> > reads are just passed straight down. >> > Writes are duplicated (the request structure, not the data) and queued to a >> > separate thread which does the actual write, but it is fairly direct. >> >> So I know there's page caching /flush involved because I watch >> /proc/meminfo and see Dirty value growing up and After reach the >> threshold, Write-back kicks in and wrote data. >> So if as you said md does no page flushing, then it must because of >> the iscsi software opens the device without O_DIRECT, so it uses page >> cache which in turn flush data to MD, now it makes more sense. >> >> But for the md write, it's not SYNC write? meaning that after write >> call with O_DIRECT to the md device returns, the data is still >> possibility on the fly to the disk? how does having a bitmap plays in >> between? does it work like ext3 jounal? after a power-loss, can we >> expect a crash consistent data on the disk? > > When you want sync writes, you need to use fsync. > > When md writes the superblock or a bitmap page it uses SYNC and FLUSH writes > to ensure they get to the media before the subsequent data write. > > >> >> Another thing to note is I found IO size on MD device is always 4K, >> which is the page size, is that normal? just want to making sure this >> isn't a bad behavior result from the iscsi software. > > It is normal in some cases. It depends a bit on the details of the > underlying device. > > > NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html