On Thu, 18 Aug 2016, Eric Wheeler wrote: > > On Wed, Jun 01 2016 at 9:44am -0400, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > > > > > > be dm-crypt.c. Maybe you've identified some indirect use of > > > > > BIO_MAX_SIZE? > > > > > > > > I mean the recently introduced BIO_MAX_SIZE in -next tree: > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486 > > > > > > The crazy bcache bios striking back once again. I really think it's > > > harmful having a _MAX value and then having a minor driver > > > reinterpreting it and sending larger ones. Until we can lift the > > > maximum limit in general nad have common code exercise it we really need > > > to stop bcache from sending these instead of littering the tree with > > > workarounds. > > > > The bio_kmalloc function allocates bios with up to 1024 vector entries (as > > opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector > > entries). > > > > Device mapper is using bio_alloc_bioset with a bio set, so it is limited > > to 256 vector entries, but other kernel users may use bio_kmalloc and > > create larger bios. > > > > So, if you don't want bios with more than 256 vector entries to exist, you > > should impose this limit in bio_kmalloc (and fix all the callers that use > > it). > > FYI, Kent Overstreet notes this about bcache from the other thread here: > https://lkml.org/lkml/2016/8/15/620 > > [paste] > >> bcache originally had workaround code to split too-large bios when it > >> first went upstream - that was dropped only after the patches to make > >> generic_make_request() handle arbitrary size bios went in. So to do what > >> you're suggesting would mean reverting that bcache patch and bringing that > >> code back, which from my perspective would be a step in the wrong > >> direction. I just want to get this over and done with. > >> > >> re: interactions with other drivers - bio_clone() has already been changed > >> to only clone biovecs that are live for current bi_iter, so there > >> shouldn't be any safety issues. A driver would have to be intentionally > >> doing its own open coded bio cloning that clones all of bi_io_vec, not > >> just the active ones - but if they're doing that, they're already broken > >> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that > >> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were > >> created by bio_clone_fast). > >> > >> And the cloning and bi_vcnt usage stuff I audited very thoroughly back > >> when I was working on immutable biovecs and such back in the day, and I > >> had to do a fair amount of cleanup/refactoring before that stuff could go > >> in. > [/paste] > > They are making progress in the patch-v3 thread, so perhaps this can be > fixed for now in generic_make_request(). > > -- > Eric Wheeler Device mapper can't split the bio in generic_make_request - it frees the md->queue->bio_split bioset, to save one kernel thread per device. Device mapper uses its own splitting mechanism. So what is the final decision? - should device mapper split the big bio or should bcache not submit big bios? I think splitting big bios in the device mapper is better - simply because it is much less code than reworking bcache to split bios internally. BTW. In the device mapper, we have a layer dm-io, that was created to work around bio size limitations - it accepts unlimited I/O request and splits it to several bios. When bio size limitations are gone, we could simplify dm-io too. Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html