On Fri, Nov 22, 2019 at 8:21 PM Pedro Ribeiro <pedrib@xxxxxxxxx> wrote: > > Hi, > > I have been trying to find out the cause of a bug that's affecting all > my external hard drive backups. > > I have three external drives, in different USB enclosures, with the same > configuration and the same problem. > > Drive A: 2TB HDD, USB3 Seagate self enclosed drive > Drive B: 4TB HDD, USB3 Toshiba self enclosed drive > Drive C: 512MB SSD, Crucial MX500 with USB-C third party enclosure > > All of the drives have a dm-crypt / LUKS on top, with a XFS partition > inside. Drive A is a few months old, Drive B is about 3 years old, drive > C about 1.5 years old. They are seldomly used (they're backup drives) so > they are all fine mechanically. > > The problem is when I attach any of the drives, enter the LUKS password > and then try to mount, this happens: > [ 66.039772] XFS (dm-0): Mounting V5 Filesystem > [ 66.060934] XFS (dm-0): log recovery read I/O error at daddr 0x0 len > 8 error -5 > [ 66.060939] XFS (dm-0): empty log check failed > [ 66.060940] XFS (dm-0): log mount/recovery failed: error -5 > [ 66.061064] XFS (dm-0): log mount failed > > No matter what I do, using all the recovery tools, etc, it's impossible > to mount... > > The thing is that is there is NOTHING wrong with these drives. The above > happens when running my specific, stripped and locked down kernel config. > > If I take Debian's 4.19 kernel config, put it on a 5.3.11 tree, run make > oldconfig and just answer the defaults on all prompts, all of the drives > above mount fine: > [ 46.184068] XFS (dm-0): Mounting V5 Filesystem > [ 46.412566] XFS (dm-0): Ending clean mount > > I hit this problem recently when I moved from kernel 4.18.20, which I > was using for a long time, to 5.3.X. In kernel 4.18.20, I did not have > any problems with my specific stripped down config. > > I have asked for help in IRC at #xfs, and one of the guys there (ailiop) > was very helpful in trying to track down the problem, but we ultimately > failed, hence why I'm asking for help here. > > I'm attaching the kernel configs and the dmesg outputs. There is nothing > obvious in the kernel config diff that should make this happen... it's a > very weird bug. > > Regards, > Pedro What about checking for differences in kernel messages between the stripped down and stocked kernel, during device discovery. That is connect no drives, boot the stripped kernel with the problem, connect one of the problem USB devices, record the kernel messages that result. Repeat that with the stock Debian kernel that doesn't exhibit the bug. My guess is this is some obscure USB related bug. There are a ton of bugs with USB enclosure firmware, controllers, and drivers. Also, is this USB enclosure directly connected to the computer? Or to a powered hub? I have inordinate problems with USB enclosures directly connected to an Intel NUC, but when connected to a Dyconn USB hub with external power source, the problems all go away. And my understanding is the hub doesn't just act like a repeater. It pretty much rewrites the entire stream. So there's something screwy going on either with the Intel controller I have, or the USB-SATA bridge chip, that causes confusing that the hub eliminates. And it may be that your stripped down kernel has turned off some obscure USB related error checking or mode switching that this particular setup needs. -- Chris Murphy