https://bugzilla.kernel.org/show_bug.cgi?id=219300 --- Comment #9 from Theodore Tso (tytso@xxxxxxx) --- It's not at all surprising that flaky hardware might have issues that are only exposed on different surprising. Different file systems might have very different I/O patterns both in terms of spatially (what blocks get used) and temporal (how many I/O requests are issued in parallel, and how quickly) and from a I/O request type (e.g., how much if any CACHE FLUSH requests, how many if any FORCED UNIT ATTENTION -- FUA). One quick thing I'd suggest that you try is to experiment with file systems other than ext4 and ntfs. For example, what happens if you use xfs or btrfs or f2fs with your test programs? If the hardware fails with xfs or btrfs, then that would very likely put the finger of blame on the hardware being cr*p. The other thing that you can try is to run tests on the raw hardware. For example, something like this [1]to write random data to the disk, and then verify the output. The block device must be able to handle having random data written at high speeds, and when you read back the data, you must get the same data written back. Unreasonable, I know, but if the storage device fails with random writes without a file system in the mix, it's going to be hopeless once you add a file system. [1] https://github.com/axboe/fio/blob/master/examples/basic-verify.fio I will note that large companies that buy millions of dollars of hardware, whether it's for data centers use at hyperscaler cloud companies like Amazon or Microsoft, or for Flash devices used in mobile devices such as Samsung, Motorola, Google Pixel devices, etc., will spend an awful lot of time qualifying the hardware to make sure it is high quality before they buy them. And they do this using raw tests to the block device, since this eliminates the excuse from the hardware company that "oh, this must be a file system bug". If there are failures found when using storage tests against the raw block device, there is no place for the hardware vendor to hide..... But in general, as Artem said, if there are any I/O failures at all, that's a huge red flagh. That essentially *proves* that the hardware is dodgy. You can have dodgy hardware without I/O errors, but if there are I/O errors reading or writing to a valid block/sector number, then by definition the hardware is the problem. And in your case, the errors are "USB disconnect" and "unit is off-line". That should never, ever happen, and if it does, then there is a hardware problem. It could be a cabling problem; it could be a problem with the SCSI/SATA/NVME/USB controller, etc., but the file system folks will tell you that if there are *any* such problems, resolve the hardware problem before you asking the file system people to debug the problem. It's much like asking a civil egnineer to ask why the building might be design issues when it's built on top of quicksand. Buildings assume that they are built on stable ground. If the ground is not stable, then chose a different building site or fix the ground first. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.