Re: Sysfs-Configurable readahead and background bypasses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/2/16 9:28 下午, Andreas wrote:
> Thank you, I understand the situation a little better now.
> 
> Saving cache space makes sense for cache drives that, as you said, are
> small. But for users like me, who are going the extra mile and install a
> generously large cache drive the behaviour is punishing.
> After upgrading my kernel and swapping out the cache drive, I was having
> trouble getting my new 128GB cache filled from three 8TB hard drives,
> which set me on my journey to figure out why and ended up writing my
> patch. I also know of people using SSDs as large as 512GB exclusively
> for bcache.
> 
> The symptom that made me curious about there being an odd change in
> bcache behaviour was my MP3 music library, where my file browser reads
> the ID3-tag information from these files. No matter how often I scrolled
> through my library, most of the traffic kept going to the hard drive and
> bcache wasn't adding any new data to the cache drive despite there being
> upwards of 100GB of unused cache space.
> As it turned out, my file explorer first issues a small read to each
> file to determine the size and position of the ID3-tag section. The
> readahead operation attached to this small read would then fetch the
> actual ID3-tag and the subsequent read for the tag data would not issue
> a seperate operation to be considered by bcache. This is then done for
> several files simultaneously - a workload an SSD can happily deal with
> but a HDD gets overwhelmed by.
> Bcache only cached that first small read for each file and ignored the
> actual ID3-tag data as it was fetched from a readahead. This behaviour
> was consistent in that even in subsequent iterations of the scenario
> only that first small read was served from the cache and then the HDD
> had to slowly seek to the actual ID3-tag data without bcache ever
> picking up on it as it was still being fetched by a readahead.
> So while in theory it might sound fine to rely on readaheads to the HDD,
> in practice it is noticeably faster to have everything coming from the
> SSD cache.
> 

Hi Andreas,

Thanks for your patience and explanation. I come to understand your use
case, it is reasonable to have such readahead data on cache device.

> I believe that one of the core problems with this behaviour is that
> bcache simply doesn't know if data fetched in a readahead is actually
> being used or not. Caching readaheads leads to false positives (data
> cached that isn't being used) and bypassing readaheads leads to false
> negatives (data not cached that is being used) - in my eyes it should be
> up to the user to decide which way works better for them if they want to.
> 
> To me, bypassing readahead and background IO only seems like a good idea
> for relatively small caches (I'd say <= 16GB). But users with bigger
> caches get punished by this behaviour as they could get better
> performance out of it (and have been until late 2017).
> 
> Beside this anecdotal evidence and thought I cannot provide any hard
> numbers on the issue.
> 

Let me explain why a performance number is desired. Normally most of
readahead pages only being accessed once, so it is sufficient to only
keep them in memory for once. It is worthy to keep the readahead data in
cache device only when the data will be accessed multiple times (hot).
Otherwise bcache just introduces more I/Os on SSD, does not help much on
performance.

If you may give me a suggested scripts or setup for performance
benchmark, it is OK for me to do such benchmark. This is useful case, as
you explained.

Thanks.

Coly Li

> Am 2019-02-16 um 1323 schrieb Coly Li:
>> On 2019/2/16 7:20 下午, Andreas wrote:
>>> Hello Coly,
>>>
>> Hi Andreas,
>>
>>> I agree with you wholeheartedly, which was the reason for my patch and
>>> email. But you seem to have gotten it the wrong way around.
>>> You see, ever since
>>> https://github.com/torvalds/linux/commit/b41c9b0266e8370033a7799f6806bfc70b7fd75f
>>> was merged into bcache in late 2017 any IO flagged as REQ_RAHEAD or
>>> REQ_BACKGROUND is simply skipped (bypassed) and no longer considered for
>>> caching at all, regardless of IO pattern.
>>>
>> Yes you are right, for normal readahead or background requests, they are
>> not fully about random I/O patterns.
>>
>>> If what you say holds true, it sounds like that patch was wrongfully
>>> merged back then, as it has introduced the behaviour you do not want
>>> now. If you believe it makes an exception for sequential FS metadata, I
>>> would very much like you to review that patch again, as that is not the
>>> case.
>>>
>>> My patch on the other hand aims to revert this change by default, so it
>>> is all about IO patterns again, but make it configurable for users who
>>> want this new behaviour.
>>>
>> [snipped]
>>
>> Most of such requests are for speculative purpose by upper layers, and a
>> lot of such requests won't be used indeed, therefore we won't have them
>> in cache device, unless they are for metadata. Such metadata blocks
>> occupy much less cache device space than normal readahead or background
>> requests, it is OK for us to have them.
>>
>> If you find there is anything I may wrongly express, that is from
>> myself; and if you find anything reasonable, that is from Eric and
>> bcache original author Kent :-)
>>
>> I agree with Eric that readahead or background requests should not
>> occupy expensive and limited cache device space. This is why I don't
>> want to change the behavior for this moment.
>>
>> This doesn't mean this patch is rejected. If,
>> 1) You may explain in which workload caching readahead or background
>> request are good for performance.
>> 2) Better performance numbers can be shared
>>
>> It would be my pleasure to review this patch. Otherwise I'd like to
>> avoid extra bypass options.
>>
>> Thanks.
>>
>> Coly Li
> 
> 


-- 

Coly Li



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux