Hi Damien, I noticed a regression in xfs/747 (an unreleased xfstest for the xfs_scrub media scanning feature) on 5.3-rc3. I'll condense that down to a simpler reproducer: # dmsetup table error-test: 0 209 linear 8:48 0 error-test: 209 1 error error-test: 210 6446894 linear 8:48 210 Basically we have a ~3G /dev/sdd and we set up device mapper to fail IO for sector 209 and to pass the io to the scsi device everywhere else. On 5.3-rc3, performing a directio pread of this range with a < 1M buffer (in other words, a request for fewer than MAX_BIO_PAGES bytes) yields EIO like you'd expect: # strace -e pread64 xfs_io -d -c 'pread -b 1024k 0k 1120k' /dev/mapper/error-test pread64(3, 0x7f880e1c7000, 1048576, 0) = -1 EIO (Input/output error) pread: Input/output error +++ exited with 0 +++ But doing it with a larger buffer succeeds(!): # strace -e pread64 xfs_io -d -c 'pread -b 2048k 0k 1120k' /dev/mapper/error-test pread64(3, "XFSB\0\0\20\0\0\0\0\0\0\fL\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1146880, 0) = 1146880 read 1146880/1146880 bytes at offset 0 1 MiB, 1 ops; 0.0009 sec (1.124 GiB/sec and 1052.6316 ops/sec) +++ exited with 0 +++ (Note that the part of the buffer corresponding to the dm-error area is uninitialized) On 5.3-rc2, both commands would fail with EIO like you'd expect. The only change between rc2 and rc3 is commit 0eb6ddfb865c ("block: Fix __blkdev_direct_IO() for bio fragments"). AFAICT we end up in __blkdev_direct_IO with a 1120K buffer, which gets split into two bios: one for the first BIO_MAX_PAGES worth of data (1MB) and a second one for the 96k after that. I think the problem is that every time we submit a bio, we increase ret by the size of that bio, but at the time we do that we have no idea if the bio is going to succeed or not. At the end of the function we do: if (!ret) ret = blk_status_to_errno(dio->bio.bi_status); Which means that we only pick up the IO error if we haven't already set ret. I suppose that was useful for being able to return a short read, but now that we always increment ret by the size of the bio, we act like the whole buffer was read. I tried a -rc2 kernel and found that 40% of the time I'd get an EIO and the rest of the time I got a short read. Not sure where to go from here, but something's not right... --D