On Fri, Jul 01, 2022 at 11:58:35AM +0800, Ming Lei wrote: > Why do we need to pay the cost(4 bytes added to bio and the updating > in absolutely fast path) if rewind isn't used? So far, only dm requeue > needs it, and it is one very unusual event. > > Assuming fixed bio end sector should cover most of cases, especially > if bio_rewind is only for dm or driver. > > I'd suggest to not take this way until turning out bio_rewind() is not > enough for new requirement or usages. Well, you're proposing add this to the block layer, not just dm, so we should be looking at potential users and not just this one use case. Check out block/blk-crypto-fallback.c -> blk_crypto_fallback_decrypt_bio: it's using __bio_for_each_segment, which is how I found it. It's also using a bio_crypt_ctx they've stashed in their per-bio bio_fallback_crypt_ctx, in blk_crypto_fallback_bio_prep, in addition to saving a copy of bio->bi_iter. bio_crypt_ctx is 40 bytes, struct bvec_iter is currently 20 bytes, so we'd be replacing 60 bytes of per-bio state with 8 bytes, if we went with bio_pos. And this code path is used by block/blk-crypto.c as well, not just blk-crypto-fallback.c. drivers/md/dm-integrity.c: Again, we're saving a bvec_iter (using dm_bio_record()) so that after the bio completes we can walk through the bio as it was when we submitted it. Here, we're also interested in bi_integrity, but what dm_bio_record() is doing looks highly suspect - we're only saving a copy of the bio->bi_integrity pointer, but bio_integrity_payload contains another bvec_iter and that's probably what should be getting saved. So for this code, switching bio_(get|set)_pos would likely either be eliminating the need for a tricky workaround I haven't spotted yet, or perhaps fixing an actual bug - and it'd be saving 20 bytes of state in dm_bio_details. Side note: according to the comment in dm_bio_details, what that code is trying to do is actually something rather interesting - fully restore the state of a bio so it can be resubmitted to another device after an error. This is probably something that's worth promoting to block/bio.c, so it can be kept in sync with e.g. bio_init() and bio_reset, and because we have other code in the kernel that's doing similar stuff and might want to make use of it if it was standard (btrfs, bcachefs, md-multipath, perhaps aoe). drivers/block/aoe/aoecmd.c: here we're also saving bvec_iter, but this code is doing a different trick that it's able to do because it's not submitting the bio to other block drivers, the terminal handling is in its own code. It's not modifying bio->bi_iter at all, which is neat because it makes resubmitting after an error trivial. This code is smart, and I would consider it a vote in favor of the bio_iter approach (but see below, I'm not actually advocating we do that). I do something similar in bcachefs, my read retry path: the original bio may have been fragmented, and a retry may be for only a fragment of an original bio. Having a separate bvec_iter means that the retry path can submit a retry for only a fragment of the original bio, without having to mutate it or race with other threads that are doing things with other parts of the original bio. It's not strictly necessary functionality - I could've also solved this by not freeing the fragment and retrying that - but it kept things simpler in other ways (the retry may itself end up being fragmented if the extents we're reading from changed, which means in the normal read path fragment bch_read_bios only ever point to toplevel bch_read_bios, but retrying fragment bch_read_bios would mean when doing retries we'd have fragments pointing to fragments pointing to fragments, and memory allocations bounded only by the maximum number of retries we could do). So the struct bio_iter approach - segregating all the things we mutate when iterating over a bio into a sub-type, i.e. what I was originally doing with bvec_iter - has some nice and useful properties, but OTOH with integrity and now bio_crypt_ctx it doesn't look super practical anymore - bio_iter would always have to contain whatever it is we need for iterating over crypt and integrity if those features are #ifdef'd in, whereas now an individual bio only pays the memory overhead for those features if they're in use. OTOH, looking at actual code, bio_(get|set)_pos gets us 90% of the elegance of a separate iterator type, and it works for what all the code I've looked at is trying to do (except for bcachefs, but I have no interest in blk crypt or integrity so I'm fine with the current situation). That's just what I found with a bit of perusing - there's other things I'd want to look at if I wanted to be thorough (anything with multipath in the name for sure, I haven't looked at all at the nvme multipath code yet).