When you write a file, it is not MD doing caching that you see. The OS caches via dirty memory before flushing to MD. If you want to write sync or O_DIRECT, do so by adding the flag to the open() call when you write a file. On Tue, Dec 6, 2011 at 4:13 PM, Yucong Sun (叶雨飞) <sunyucong@xxxxxxxxx> wrote: > On Tue, Dec 6, 2011 at 2:26 PM, NeilBrown <neilb@xxxxxxx> wrote: >> On Tue, 6 Dec 2011 14:01:14 -0800 Yucong Sun (叶雨飞) <sunyucong@xxxxxxxxx> >> wrote: >> >>> Hi, >>> >>> I recently setup raid10 on 4 physical disk and have a iscsi serve it >>> as a block device, and have been trying to tweak for performance. >>> >>> First thing I notice that MD seems to rely on page cache to flush >>> changes to disk, is there any way to turn that off so changes are >>> flushed to the disk? like O_FSYNC|O_DIRECT does? The reason I want to >>> turn it off is to understand the performance difference, I want to be >>> sure that page cache is truly acting as a write-back cache, I know one >>> can tune the dirty_* to control the cache flush, but I want to make >>> sure that it is actually doing what I think it does. >> >> Why do you think this? >> >> md/raid10 sends all request straight through to the relevant underlying >> device(s). >> reads are just passed straight down. >> Writes are duplicated (the request structure, not the data) and queued to a >> separate thread which does the actual write, but it is fairly direct. > > So I know there's page caching /flush involved because I watch > /proc/meminfo and see Dirty value growing up and After reach the > threshold, Write-back kicks in and wrote data. > So if as you said md does no page flushing, then it must because of > the iscsi software opens the device without O_DIRECT, so it uses page > cache which in turn flush data to MD, now it makes more sense. > > But for the md write, it's not SYNC write? meaning that after write > call with O_DIRECT to the md device returns, the data is still > possibility on the fly to the disk? how does having a bitmap plays in > between? does it work like ext3 jounal? after a power-loss, can we > expect a crash consistent data on the disk? > > Another thing to note is I found IO size on MD device is always 4K, > which is the page size, is that normal? just want to making sure this > isn't a bad behavior result from the iscsi software. >> >>> >>> Then I notice in output of free, the number in Cache column is very >>> low, however the Buffer is very high, my question is does Buffer here >>> serves as a read cache? I couldn't find the answer anywhere else. >> >> The best place to find the answer is in the source code. >> >> Every page in the page cache is associated with some file. >> If that file is a block device (e.g. /dev/sdX) then it is reported as >> 'Buffer' otherwise it is reported as 'Cache'. >> >> Some filesystems like ext3 uses 'Buffer' memory for metadata but call use >> 'Cache' memory for files and directories. >> > > Thanks, it is being used as read cache then, too bad there's no easy > way to measure/see the hit rate. > >>> >>> My last question is that since MD seems already doing the cache, what >>> effect would it have if I want to setup a LO device in front of MD >>> device, Is there going to be more caching, how is different than just >>> plain MD device? >> >> MD/raid10 does no caching. >> A loop-back over the md device would not add extra caching. >> >> NeilBrown >> >> >>> >>> Thanks. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html