Re: BTRFS partition corrupted after deleting files in /home

Sreyan Chakravarty <sreyan32@xxxxxxxxx> · Mon, 4 Jan 2021 19:12:42 +0530

On Sun, Jan 3, 2021 at 11:06 PM Andrej Podzimek via users
<users@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Are you sure you are opening the right LUKS device in the live environment? Is the LUKS device readable (e.g. just using "cat /dev/mapper/dm_crypt > /dev/null")? (Does its size look right, e.g. in "lsblk -p"?) Do you get any read errors in dmesg (for NVME / SAS / SATA)? If you pipe your direct partition read through "pv -arb" ("pv -arb /dev/mapper/dm_crypt > /dev/null") (or another cat-like tool that shows the data rate), does it look reasonable?

Yes it is fully readable.

I just got a full ddrescue image that had 0 bad-sectors. So nothing is
wrong with my disk.

This is the ddrescue output:

GNU ddrescue 1.25
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 998575 MB, tried: 0 B, bad-sector: 0 B, bad areas: 0

Current status
    ipos:        0 B, non-trimmed:        0 B,  current rate:       0 B/s
    opos:        0 B, non-scraped:        0 B,  average rate:       0 B/s
non-tried:        0 B,  bad-sector:        0 B,    error rate:       0 B/s
 rescued:  998575 MB,   bad areas:        0,        run time:          0s
pct rescued:  100.00%, read errors:        0,  remaining time:         n/a
                             time since last successful read:         n/a
Finished

As you can see there are no bad sectors.

$ pv -arb /dev/mapper/dm_crypt > /dev/null
452GiB [92.0MiB/s] [97.5MiB/s]

The data rate is also reasonable.

>
> Saving a binary image of your device would be a good first step — if the device is still readable.
>
Yes, I did that that's why you are getting a late reply.

> What makes you so sure that this is a Btrfs problem, as opposed to a SSD or hard drive failure or a RAM failure causing data corruption?
>   (Were there no other errors before the Btrfs errors in "dmesg"?)

I think it is BTRFS because I recently had to do a lot of snapshot
creation and restoration.

Also, I don't think my RAM is to blame since I have never had a
problem with it, even now I have been on my live system for about 14
hrs, since I had to get all my work done from there.

>
>
> While data loss of any kind is (understandably) frustrating, claiming that Btrfs is “unstable” is plain wrong and unhelpful and it is unlikely to motivate Btrfs experts to chime in and help.
>   :-/
>
I believe it's better to call this out, rather than worry about
hurting peoples feelings.

> A few suggestions:
> 0. Take a binary backup of your Btrfs device, if it’s still readable.

Done.

> 1. Check your RAM. Does the machine have ECC? You may want to give it a few hours of memtest, no matter what.
>

I don't think my RAM is at fault. What is an ECC ?
I will give it a memtest irregardless and get back to you, but I think
it will be a waste of time.

> 2. Check your SSD / disk whether it’s reading at a reasonable pace and showing nothing suspicious in "smartctl -A" and "dmesg".
>
SmartCTL output:
https://pastebin.com/raw/B6AdLZXt

I ran the smartctl test a month ago, since I though there was
something wrong with my HDD but the guys on the mailing list told me I
did not have to worry.

https://listi.jpberlin.de/pipermail/smartmontools-support/2020-November/000560.html

> 3. Then there are a few tools (see man btrfs-check, man btrfs-rescue, man btrfs-restore) you might want to try, depending on the situation. Some of them require help from Btrfs experts (at which point you may want to ask on their kernel mailing lists).
>

Yeah that's the only option I have left.

-- 
Regards,
Sreyan Chakravarty
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx