Re: BTRFS partition corrupted after deleting files in /home

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Thu, 14 Jan 2021 00:21:05 -0700

On Wed, Jan 13, 2021 at 2:41 AM Sreyan Chakravarty <sreyan32@xxxxxxxxx> wrote:
>
> On Tue, Jan 12, 2021 at 9:16 AM Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:
> >
> >
> > -x has more information that might be relevant including firmware
> > revision and some additional logs for recent drive reported errors
> > which usually are benign. But might be clues.
> >
> > These two attributes I'm not familiar with
> > 187 Reported_Uncorrect      0x0032   100   096   000    Old_age
> > Always       -       4294967301
> > 188 Command_Timeout         0x0032   100   100   000    Old_age
> > Always       -       98785820672
> >
> > But the value is well above threshold for both so I'm not worried about it.
> >
> >
>
> Here is the output of:
>
> # smartctl -Ax /dev/sda
>
> https://pastebin.com/raw/GrgrQrSf
>
> I have no idea what it means.

  9 Power_On_Hours          -O--CK   097   097   000    -    1671

There's a bunch of write errors at 548 hours. And more recently read
errors followed by:

  10 -- 41 05 40 00 00 60 d9 6e 08 00 00  Error: IDNF at LBA =
0x60d96e08 = 1624862216

548 hours is not an old drive. It shouldn't have any write errors. But
as a drive ages there might be some and they should be handled
transparently and not affect any other operation. Something is
definitely wrong that there are write errors followed by read errors
followed by this IDNF error which suggests the drive is having
problems read its own data written to sectors reserved for its own
use, for example its firmware is often on the drive for some
make/models, and so is the bad blocks map. If this sector isn't
reading correctly *who knows* what weird things that could trigger. I
know who, the person who writes drive firmware. I don't, so I can only
speculate that this looks pretty much like a drive that needs to be
used for some other purpose or disposed of.

And also? Maybe it's getting worse. Maybe it wasn't exhibiting any of
these problems yet when you were using NTFS and ext4. Or wasn't ever
detected. The first thing that did detect it was LVM thin's metadata
checksumming. And the most recent is btrfs.

If it were in warranty I'd say make it the manufacturers problem. If
it's not, well you could take a chance with it in a Btrfs raid1 with
some other cheap drive. Whatever problems they have will be different,
and Btrfs will automatically detect and repair them as long as they
don't happen at the same time in the same place (astronomically
unlikely). So that'd be a good use for it. I personally wouldn't be
any other file system on it except maybe ZFS, because with this
behavior, you don't want to trust any data to it without full data
checksumming. Let alone important data.

> This is the problem with SMART tests, they are so esoteric that it is
> difficult for a common user to make sense of it.

All low level stuff is like this,  yeah.

-- 
Chris Murphy
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx