On Mon, Feb 12, 2024 at 04:52:09PM +0100, Martin Steigerwald wrote: > Kent Overstreet - 11.02.24, 19:51:32 CET: > > On Sun, Feb 11, 2024 at 06:06:27PM +0100, Martin Steigerwald wrote: > […] > > > CC'ing BCacheFS mailing list. > > > > > > My original mail is here: > > > > > > https://lore.kernel.org/linux-usb/5264d425-fc13-6a77-2dbf-6853479051a0 > > > @applied-asynchrony.com/T/ #m5ec9ecad1240edfbf41ad63c7aeeb6aa6ea38a5e > > > > > > Holger Hoffstätte - 11.02.24, 17:02:29 CET: > > > > On 2024-02-11 16:42, Martin Steigerwald wrote: > > > > > Hi! > > > > > I am trying to put data on an external Kingston XS-2000 4 TB SSD > > > > > using > > > > > self-compiled Linux 6.7.4 kernel and encrypted BCacheFS. I do not > > > > > think BCacheFS has any part in the errors I see, but if you > > > > > disagree > > > > > feel free to CC the BCacheFS mailing list as you reply. > > > > > > > > This is indeed a known bug with bcachefs on USB-connected devices. > > > > Apply the following commit: > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c > > > > ommi t/fs/bcachefs?id=3e44f325f6f75078cdcd44cd337f517ba3650d05 > > > > > > > > This and some other commits are already scheduled for -stable. > > > > > > Thanks! > > > > > > Oh my. I was aware of some bug fixes coming for stable. I briefly > > > looked through them, but now I did not make a connection. > > > > > > I will wait for 6.7.5 and retry then I bet. > > > > That doesn't look related - the device claims to not support flush or > > fua, and the bug resulted in us not sending flush/fua devices; the main > > thing people would see without that patch, on 6.8, would be an immediate > > -EOPNOTSUP on the first flush journal write. > > > > He only got errors after an hour or so, or 10 minutes with UAS disabled; > > we send flushes once a second. Sounds like a screwy device. > > Thanks for that explanation, Kent. > > I am the one with that external Transcend XS 2000 4 TB SSD and I > specifically did not CC bcachefs mailing list at the beginning as after > seeing things like > > [33963.462694] sd 0:0:0:0: [sda] tag#10 uas_zap_pending 0 uas-tag 1 inflight: CMD > [33963.462708] sd 0:0:0:0: [sda] tag#10 CDB: Write(16) 8a 00 00 00 00 00 82 c1 bc 00 00 00 04 00 00 00 > […] > [33963.592872] sd 0:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK cmd_age=182s > > I thought some quirks in the device to be at fault. > > However while Sandisk Extreme Pro 2 TB claims to support DPO and FUA I see > > Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > > also with other devices like external Toshiba Canvio 4 TB hard disks. Using > LUKS encrypted BTRFS on those I never saw any timeout while writing out > data issue with any of those hard disks. Also with disabled write cache > any cache flush / FUA request should be a no-op anyway? These hard disks > have been doing a ton of backup workloads without any issues, but so far > only with BTRFS. > > I may test the Transcend XS2000 with BTRFS to see whether it makes a > difference, however I really like to use it with BCacheFS and I do not really > like to use LUKS for external devices. According to the kernel log I still > don't really think those errors at the block layer were about anything > filesystem specific, but what do I know? It's definitely not unheard of for one specific filesystem to be tickling driver/device bugs and not others. I wonder what it would take to dump the outstanding requests on device timeout.