One of the annoying things in the iomap code is how we handle block-misaligned I/Os. Consider a write to a file on a 4KiB block size filesystem (on a 4KiB page size kernel) which starts at byte offset 5000 and is 4133 bytes long. Today, we allocate page 1 and read bytes 4096-8191 of the file into it, synchronously. Then we allocate page 2 and read bytes 8192-12287 into it, again, synchronously. Then we copy the user's data into the pagecache and mark it dirty. This is a fairly significant delay for the user who normally sees the latency of a memcpy() now has to wait for two non-overlapping reads to complete. What I'd love to be able to do is allocate pages 1 & 2, copy the user data into it and submit one read which targets: 0-903: page 1, offset 0, length 904 904-5036: bitbucket, length 4133 5037-8191: page 2, offset 942, length 3155 That way, we don't even need to wait for the read to complete. I envisage block allocating a bitbucket page to support devices which don't have native support for bitbucket descriptors. We'd also need a fallback path for devices which don't support whatever alignment the I/O is happening at ... but the block layer already has support for bounce-buffering, right? Anyway, I don't have time to take on this work, but I thought I'd throw it out in case anyone's looking for a project. Or if it's a stupid idea, someone can point out why.