Re: [PATCH] bcache: Fix writeback_thread never writing back incomplete stripes.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kent, Vojtech, list

Your fix works perfectly, I can finally remove my patch that disables
the partial_stripes_expensive branch and unleash the full bcache
performance on my RAID-6 array. I was aware of this problem for some
time now, but never got to learning the bcache codebase enough to find
out why this happens with partial stripes write-back enabled, so I
just disabled it to avoid CPU spinning. Thanks a lot to Vojtech for
finding the reason and to you for the patch. I have quite a collection
of stability patches for the mainline bcache. Would you please be kind
to review them when convenient and merge upstream the ones you think
are worth merging. It will be 4.4 merge window, I believe.
Please find all the patches attached. All of them are rebased on
mainline 4.2 kernel. If somebody needs the 4.1.x versions, please let
me know.


On Wed, Sep 16, 2015 at 7:32 AM, Kent Overstreet
<kent.overstreet@xxxxxxxxx> wrote:
> On Sat, Sep 05, 2015 at 01:10:12PM +0200, Vojtech Pavlik wrote:
>> Fix writeback_thread never finishing writing back all dirty data in bcache when
>> partial_stripes_expensive is set, and spinning, consuming 100% of CPU instead.
>>
>> Signed-off-by: Vojtech Pavlik <vojtech@xxxxxxxx>
>> ---
>>
>> This is a fix for the current upstream bcache, not the devel branch.
>>
>> If partial_stripes_expensive is set for a cache set, then writeback_thread
>> always attempts to write full stripes out back to the backing device first.
>> However, since it does that based on a bitmap and not a simple linear
>> search, like the rest of the code of refill_dirty(), it changes the
>> last_scanned pointer so that never points to 'end'. refill_dirty() then
>> never tries to scan from 'start', resulting in the writeback_thread
>> looping, consuming 100% of CPU, but never making progress in writing out
>> the incomplete dirty stripes in the cache.
>>
>> Scanning the tree after not finding enough full stripes fixes the issue.
>>
>> Incomplete dirty stripes are written to the backing device, the device
>> eventually reaches a clean state if there is nothing dirtying data and
>> writeback_thread sleeps. This also fixes the problem of the cache device
>> not being possible to detach in the partial_stripes_expensive scenario.
>
> Good catch!
>
>> It may be more efficient to separate the last_scanned field for normal and
>> stripe scans instead.
>
> Actually, I think I like your approach - I think it gives us the behaviour we
> want, although it took some thinking to figure out why.
>
> One of the reasons for last_scanned is just to do writeback in LBA order and
> avoid making the disks seek around - so, if refill_full_stripes() did just queue
> up some IO at last_scanned, we do want to keep scanning from that position for
> non full stripes.
>
> But since (as you noted) last_scanned is never going to be >= end after calling
> refill_full_strips() (if there were any full stripes) - really the correct thing
> to do is loop around to the start if necessary so that we can successfully scan
> everything.
>
> The only inefficiency I see with your approach is that on the second scan, after
> we've looped, we don't need to scan to the end of the disk - we only need to
> scan up to where we initially started.
>
> But, another observation - if we change refill_dirty() so that it always scans
> the entire keyspace if necessary, regardless of where last_scanned was - well,
> that really isn't specific to the refill_full_stripes() case - we can just
> always do that.
>
> Can you give this patch a try?
>
> -- >8 --
> Subject: [PATCH] bcache: Change refill_dirty() to always scan entire disk if necessary
>
> Previously, it would only scan the entire disk if it was starting from the very
> start of the disk - i.e. if the previous scan got to the end.
>
> This was broken by refill_full_stripes(), which updates last_scanned so that
> refill_dirty was never triggering the searched_from_start path.
>
> But if we change refill_dirty() to always scan the entire disk if necessary,
> regardless of what last_scanned was, the code gets cleaner and we fix that bug
> too.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> ---
>  drivers/md/bcache/writeback.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index cdde0f32f0..08a52db38b 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -359,11 +359,13 @@ next:
>         }
>  }
>
> +/*
> + * Returns true if we scanned the entire disk
> + */
>  static bool refill_dirty(struct cached_dev *dc)
>  {
>         struct keybuf *buf = &dc->writeback_keys;
> -       struct bkey end = KEY(dc->disk.id, MAX_KEY_OFFSET, 0);
> -       bool searched_from_start = false;
> +       struct bkey start_pos, end = KEY(dc->disk.id, MAX_KEY_OFFSET, 0);
>
>         if (dc->partial_stripes_expensive) {
>                 refill_full_stripes(dc);
> @@ -371,14 +373,20 @@ static bool refill_dirty(struct cached_dev *dc)
>                         return false;
>         }
>
> -       if (bkey_cmp(&buf->last_scanned, &end) >= 0) {
> -               buf->last_scanned = KEY(dc->disk.id, 0, 0);
> -               searched_from_start = true;
> -       }
> -
> +       start_pos = buf->last_scanned;
>         bch_refill_keybuf(dc->disk.c, buf, &end, dirty_pred);
>
> -       return bkey_cmp(&buf->last_scanned, &end) >= 0 && searched_from_start;
> +       if (bkey_cmp(&buf->last_scanned, &end) < 0)
> +               return false;
> +
> +       /*
> +        * If we get to the end start scanning again from the beginning, and
> +        * only scan up to where we initially started scanning from:
> +        */
> +       buf->last_scanned = KEY(dc->disk.id, 0, 0);
> +       bch_refill_keybuf(dc->disk.c, buf, &start_pos, dirty_pred);
> +
> +       return bkey_cmp(&buf->last_scanned, &start_pos) >= 0;
>  }
>
>  static void bch_writeback(struct cached_dev *dc)
> --
> 2.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 

Denis

Attachment: bcache_correct_errors_on_register.patch
Description: Binary data

Attachment: bcache-attach-detach-cleanup.patch
Description: Binary data

Attachment: bcache-cond_resched.patch
Description: Binary data

Attachment: bcache-fix_passthrough_shutdown_crash.patch
Description: Binary data

Attachment: bcache-fix-memleak-bch_cached_dev_run.patch
Description: Binary data

Attachment: bcache-rcu-sched-bugfix.patch
Description: Binary data

Attachment: bcache-refill_dirty_to_always_scan_entire_disk.patch
Description: Binary data

Attachment: bcache-remove-bio-splitting.patch
Description: Binary data

Attachment: bcache-unregister-reboot-notifier.patch
Description: Binary data


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux