Re: Raid10 and page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When you write a file, it is not MD doing caching that you see. The OS
caches via dirty memory before flushing to MD. If you want to write
sync or O_DIRECT, do so by adding the flag to the open() call when you
write a file.

On Tue, Dec 6, 2011 at 4:13 PM, Yucong Sun (叶雨飞) <sunyucong@xxxxxxxxx> wrote:
> On Tue, Dec 6, 2011 at 2:26 PM, NeilBrown <neilb@xxxxxxx> wrote:
>> On Tue, 6 Dec 2011 14:01:14 -0800 Yucong Sun (叶雨飞) <sunyucong@xxxxxxxxx>
>> wrote:
>>
>>> Hi,
>>>
>>> I recently setup raid10 on 4 physical disk and have a iscsi serve it
>>> as a block device, and have been trying to tweak for performance.
>>>
>>> First thing I notice that MD seems to rely on page cache to flush
>>> changes to disk,  is there any way to turn that off so changes are
>>> flushed to the disk? like O_FSYNC|O_DIRECT does? The reason I want to
>>> turn it off is to understand the performance difference,  I want to be
>>> sure that page cache is truly acting as a write-back cache, I know one
>>> can tune the dirty_* to control the cache flush, but I want to make
>>> sure that it is actually doing what I think it does.
>>
>> Why do you think this?
>>
>> md/raid10 sends all request straight through to the relevant underlying
>> device(s).
>> reads are just passed straight down.
>> Writes are duplicated (the request structure, not the data) and queued to a
>> separate thread which does the actual write, but it is fairly direct.
>
> So I know there's page caching /flush involved because I watch
> /proc/meminfo and see  Dirty value growing up and After reach the
> threshold, Write-back kicks in and wrote data.
> So if as you said md does no page flushing, then it must because of
> the iscsi software opens the device without O_DIRECT, so it uses page
> cache which in turn flush data to MD, now it makes more sense.
>
> But for the md write, it's not SYNC write? meaning that after write
> call with O_DIRECT to the md device returns, the data is still
> possibility on the fly to the disk? how does having a bitmap plays in
> between? does it work like ext3 jounal? after a power-loss, can we
> expect a crash consistent data on the disk?
>
> Another thing to note is I found IO size on MD device is always 4K,
> which is the page size, is that normal? just want to making sure this
> isn't a bad behavior result from the iscsi software.
>>
>>>
>>> Then I notice in output of free,  the number in Cache column is very
>>> low, however the Buffer is very high, my question is does Buffer here
>>> serves as a read cache? I couldn't find the answer anywhere else.
>>
>> The best place to find the answer is in the source code.
>>
>> Every page in the page cache is associated with some file.
>> If that file is a block device (e.g. /dev/sdX) then it is reported as
>> 'Buffer' otherwise it is reported as 'Cache'.
>>
>> Some filesystems like ext3 uses 'Buffer' memory for metadata but call use
>> 'Cache' memory for files and directories.
>>
>
> Thanks, it is being used as read cache then, too bad there's no easy
> way to measure/see the hit rate.
>
>>>
>>> My last question is that since MD seems already doing the cache,  what
>>> effect would it have if I want to setup a LO device in front of MD
>>> device, Is there going to be more caching, how is different than just
>>> plain MD device?
>>
>> MD/raid10 does no caching.
>> A loop-back over the md device would not add extra caching.
>>
>> NeilBrown
>>
>>
>>>
>>> Thanks.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux