Re: Sysfs-Configurable readahead and background bypasses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you, I understand the situation a little better now.

Saving cache space makes sense for cache drives that, as you said, are
small. But for users like me, who are going the extra mile and install a
generously large cache drive the behaviour is punishing.
After upgrading my kernel and swapping out the cache drive, I was having
trouble getting my new 128GB cache filled from three 8TB hard drives,
which set me on my journey to figure out why and ended up writing my
patch. I also know of people using SSDs as large as 512GB exclusively
for bcache.

The symptom that made me curious about there being an odd change in
bcache behaviour was my MP3 music library, where my file browser reads
the ID3-tag information from these files. No matter how often I scrolled
through my library, most of the traffic kept going to the hard drive and
bcache wasn't adding any new data to the cache drive despite there being
upwards of 100GB of unused cache space.
As it turned out, my file explorer first issues a small read to each
file to determine the size and position of the ID3-tag section. The
readahead operation attached to this small read would then fetch the
actual ID3-tag and the subsequent read for the tag data would not issue
a seperate operation to be considered by bcache. This is then done for
several files simultaneously - a workload an SSD can happily deal with
but a HDD gets overwhelmed by.
Bcache only cached that first small read for each file and ignored the
actual ID3-tag data as it was fetched from a readahead. This behaviour
was consistent in that even in subsequent iterations of the scenario
only that first small read was served from the cache and then the HDD
had to slowly seek to the actual ID3-tag data without bcache ever
picking up on it as it was still being fetched by a readahead.
So while in theory it might sound fine to rely on readaheads to the HDD,
in practice it is noticeably faster to have everything coming from the
SSD cache.

I believe that one of the core problems with this behaviour is that
bcache simply doesn't know if data fetched in a readahead is actually
being used or not. Caching readaheads leads to false positives (data
cached that isn't being used) and bypassing readaheads leads to false
negatives (data not cached that is being used) - in my eyes it should be
up to the user to decide which way works better for them if they want to.

To me, bypassing readahead and background IO only seems like a good idea
for relatively small caches (I'd say <= 16GB). But users with bigger
caches get punished by this behaviour as they could get better
performance out of it (and have been until late 2017).

Beside this anecdotal evidence and thought I cannot provide any hard
numbers on the issue.

Am 2019-02-16 um 1323 schrieb Coly Li:
> On 2019/2/16 7:20 下午, Andreas wrote:
>> Hello Coly,
>>
> Hi Andreas,
>
>> I agree with you wholeheartedly, which was the reason for my patch and
>> email. But you seem to have gotten it the wrong way around.
>> You see, ever since
>> https://github.com/torvalds/linux/commit/b41c9b0266e8370033a7799f6806bfc70b7fd75f
>> was merged into bcache in late 2017 any IO flagged as REQ_RAHEAD or
>> REQ_BACKGROUND is simply skipped (bypassed) and no longer considered for
>> caching at all, regardless of IO pattern.
>>
> Yes you are right, for normal readahead or background requests, they are
> not fully about random I/O patterns.
>
>> If what you say holds true, it sounds like that patch was wrongfully
>> merged back then, as it has introduced the behaviour you do not want
>> now. If you believe it makes an exception for sequential FS metadata, I
>> would very much like you to review that patch again, as that is not the
>> case.
>>
>> My patch on the other hand aims to revert this change by default, so it
>> is all about IO patterns again, but make it configurable for users who
>> want this new behaviour.
>>
> [snipped]
>
> Most of such requests are for speculative purpose by upper layers, and a
> lot of such requests won't be used indeed, therefore we won't have them
> in cache device, unless they are for metadata. Such metadata blocks
> occupy much less cache device space than normal readahead or background
> requests, it is OK for us to have them.
>
> If you find there is anything I may wrongly express, that is from
> myself; and if you find anything reasonable, that is from Eric and
> bcache original author Kent :-)
>
> I agree with Eric that readahead or background requests should not
> occupy expensive and limited cache device space. This is why I don't
> want to change the behavior for this moment.
>
> This doesn't mean this patch is rejected. If,
> 1) You may explain in which workload caching readahead or background
> request are good for performance.
> 2) Better performance numbers can be shared
>
> It would be my pleasure to review this patch. Otherwise I'd like to
> avoid extra bypass options.
>
> Thanks.
>
> Coly Li






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux