Hi Jens, It seems that there is a regression in direct-io over loop for partial direct-io reads (or perhaps even for other situations). If I run this code (loop over 6M file, dd direct-io read with 4M blocks) IMG=tst.img LOOP=/dev/loop66 truncate -s 6M $IMG losetup $LOOP $IMG dd if=$LOOP of=/dev/null bs=4M iflag=direct losetup -d $LOOP on older kernel (<=5.16) it reads the whole file 6291456 bytes (6.3 MB, 6.0 MiB) copied, 0.201591 s, 31.2 MB/s while on 5.17-rc (tested on today/s Linus' git) it reads only the full blocks: 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.201904 s, 20.8 MB/s No error reported, exit code is 0. I am not sure if the reproducer is intended behavior (there was some discussion, though), but it is a major regression IMO. If you do not want to support partial direct-io, it should return an error then and not EOF (the last read returns 0)! It was detected in our cryptsetup testsuite and we always use direct aligned IO, but for the reported device sector size only (like 4k). But here it breaks dd without any chance to detect error. Bisect ends on your patch (and clean revert indeed fixes the issue): ceaa762527f41a431b552bc000de4b626d2d8cb7 is the first bad commit commit ceaa762527f41a431b552bc000de4b626d2d8cb7 Author: Jens Axboe <axboe@xxxxxxxxx> Date: Thu Oct 28 08:57:09 2021 -0600 block: move direct_IO into our own read_iter handler Don't call into generic_file_read_iter() if we know it's O_DIRECT, just set it up ourselves and call our own handler. This avoids an indirect call for O_DIRECT. Fall back to filemap_read() if we fail. Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> block/fops.c | 37 ++++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) Please could you check what's wrong here? Thanks, Milan