Re: Help request

Michael Kjörling <michael@xxxxxxxxxxx> · Tue, 7 Jul 2020 20:58:14 +0000

On 6 Jul 2020 03:36 +0000, from lacedaemonius1@xxxxxxxxxxxxxx (lacedaemonius):
> [ 643.631782] print_req_error: critical target error, dev sdi, sector 11721044993
> [ 643.631789] Buffer I/O error on dev sdi, logical block 11721044993, async page read

Notice that the errors are occuring on the raw device, not through a
dm-* mapping. That sector address is just past the 6 TB (about 5.46
TiB) mark; does that sound reasonable given the drive size? (It would
if the physical drive is _more_ than 6 TB in size, and it might if the
drive is advertised as 6 TB.) Assuming that the problematic drive is
still detected as sdi, what's the contents of /sys/block/sdi/size?
(That should be _at least_ 11721044993; otherwise, some metadata
somewhere has been corrupted.)

If you luksOpen the LUKS container and "file -Ls" the corresponding
file in /dev/mapper, then what is the output of that? It should
indicate an ext4 file system in your case.

If that too fails, then I would suggest a pass of ddrescue reading
from the raw backing device and writing to /dev/null. (If you do this,
make VERY VERY SURE that you get the order right!) That will tell you
whether the data on the drive itself can be read without errors. If
you have enough storage elsewhere to make a copy of the whole contents
of the drive, strongly consider writing it there instead of throwing
it away; it can't hurt, and it might help. If you do this, expect it
to take the better part of a day to complete. (6 TB at 100 MB/s is
16-17 hours; you haven't specified the drive size, and 100 MB/s is a
reasonable average for a 7200 rpm rotational drive.)

That you're seeing delays of several seconds for those reads, and
user-visible delays of more than that, suggests to me that it's not
just an out-of-bounds read command issued to the drive, which should
return more or less immediately with something like sector not found,
which in turn would be propagated as an I/O error.

Is the LUKS container LUKS 1 or LUKS 2? Is the drive GPT partitioned,
or something else?

> I don't think it's a drive failure because it's only a few months
> old and I haven't got any SMART warnings, so that leaves software.

Unfortunately, drives can fail without reporting failures in SMART
data, and they can fail early. While the probability of either is
_lower_, it is non-zero.

An in-use drive failing certainly can cause issues to the running
system. A drive failing but not holding swap or a critical file system
_shouldn't_ cause the kernel to crash, but I wouldn't completely rule
out the possibility.

The fact that the LUKS container was not closed _should_ not cause any
issues after a reboot, because closing the container really just
removes bookkeeping information and cryptographic keys from kernel
memory; it doesn't affect on-disk data. An unclean shutdown isn't
ideal for ext4, but it's usually not catastrophic.

> Is it worth making any attempt at trying to recover the drive and if
> so is there any documentation that explains what to do? I don't have
> a backup of the LUKS header, if that's the problem.

Do you have a recent backup of the data on the drive, or does the
drive that is giving you problems hold the only copy? Is it data that
you care a lot about, or can it be easily restored from other sources?
(This basically boils down to: how important is it to rescue the data
in-place?)

-- 
Michael Kjörling • https://michael.kjorling.se • michael@xxxxxxxxxxx
 “Remember when, on the Internet, nobody cared that you were a dog?”

_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
https://www.saout.de/mailman/listinfo/dm-crypt