On Tue, Jul 09, 2019 at 05:23:30PM +0200, Christoph Hellwig wrote: > On Tue, Jul 09, 2019 at 08:15:08AM +1000, Dave Chinner wrote: > > That fixes the problem I saw, but I think bio_chain() needs some > > more checks to prevent this happening in future. It's trivially > > easy to chain the bios in the wrong order, very difficult to spot > > in review, and difficult to trigger in testing as it requires > > chain nesting and adverse IO timing to expose.... > > Not sure how we can better check it. At best we can set a flag for a > bio that is a chain "child" and complain if someone is calling > submit_bio_wait, but that would only really cover the wait case. I think submit_bio_wait ought to at least WARN_ON_ONCE if it was fed a bio with bi_end_io already set, which at least would have made it more obvious that we'd screwed something up in this case, even if the detection was after we'd already done bio_chain in the wrong order. Granted IIRC Dave sent a fix for a zeroout integer overflow a while ago and Jens committed the patch with the debugging assertions removed, so ... yay? Maybe we just need CONFIG_BLK_DEBUG for these kinds of assertions so that ignorant clods like me have another line of defense against bugs and the growing crowd of people who care about performance above correctness can crash faster. <grumble> > But one thing I planned to do is to lift xfs_chain_bio to the block > layer so that people can use it for any kind of continuation bio > instead of duplicating the logic. That'll help, I suspect. :) --D