Re: Undoing an "Auto-Stop" when Cache device has recovered?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 25, 2021 at 2:18 AM Nikolaus Rath <nikolaus@xxxxxxxx> wrote:
>
>
> On Thu, 25 Mar 2021, at 05:29, Coly Li wrote:
> > On 3/25/21 4:21 AM, Nikolaus Rath wrote:
> > > Hello,
> > >
> > > My (writeback enabled) bcache cache device had a temporary failure, but seems to have fully recovered (it may have been overheating or a loose cable).
> > >
> > > From the last kernel messages, it seems that bcache tried to flush the dirty data, but failed, and then stopped the cache device.
> > >
> > > After a reboot, the bcacheX device indeed no longer has an associated cache set..
> > >
> > > I think in my case the cache device is in perfect shape again and still has all the data, so I would really like bcache to attach it again so that the dirty cache data is not lost.
> > >
> > > Is there a way to do that?
> > >
> > > (Yes, I will still replace the device afterwards)
> > >
> > > (I am pretty sure that just re-attaching the cacheset will make bcache forget that there was a previous association and will wipe the corresponding metadata).
> > >
> >
> > Hi Nikolaus,
> >
> > Do you have the kernel log? It depends on whether the cache set is clean
> > or not. For a clear cache set, the cache set is detached, and next
> > reattach will invalidate all existing cached data. If the cache set is
> > dirty and all existing data is wiped, that will be fishy....
>
> Hi Cody,
>
> I'm not sure I understand. I believe there is dirty data on the cacheset (it was effectively disconnected in the middle of operations). Also, if it wasn't dirty then there would be no need to re-attach it (all the important data would be on the backing device).
>
> On the other hand, after a reboot the cache set shows up in /sys/fs/bcache - just not associated with any backing device. So I guess from that point of view it is clean?

I actually have experienced very similar behavior with a transient
cache device failure (it's not totally dead) and just posted here
recently: https://marc.info/?l=linux-bcache&m=161642940714578&w=1

My thought was to use "panic" in the 'errors' sysfs attribute so the
machine panics instead of detaching the cache device. Otherwise, it
seems the cache device gets detached with dirty data present, and the
backing device is started (yet data is not present).

I'll work on reproducing the original case with the "unregister" value
and provide logs, as it sounds like this behavior is unexpected (eg, a
cache device should only detach if there is NO dirty data present).

--Marc

>
> The kernel logs are on the affected bcache, and I have avoided doing anything with it (including mounting). I took a few pictures of the last visible messages on the console before re-booting though. For example, here is when the problem starts
>
> First ATA errors: https://drive.google.com/file/d/1_vr-JBWZjajzbWyXUSmtn4faNH6072ut/view?usp=sharing
> First bcache errors: https://drive.google.com/file/d/1XLCWDi6G2lP1JiVitZTtIqzB4QqxXv2-/view?usp=sharing
>
> Does that help?
>
> Best,
> -Nikolaus
>
> --
> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>
>              »Time flies like an arrow, fruit flies like a Banana.«




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux