On 12/19/18 9:06 AM, Ming Lei wrote: > On Wed, Dec 19, 2018 at 08:50:09AM -0700, Jens Axboe wrote: >> DM currently has a statically allocated bio that it uses to issue empty >> flushes. It doesn't submit this bio, it just uses it for maintaining >> state while setting up clones. Multiple users can access this bio at the >> same time. This wasn't previously an issue, even if it was a bit iffy, >> but with the blkg associations it can become one. >> >> We setup the blkg association, then clone bio's and submit, then remove >> the blkg assocation again. But since we can have multiple tasks doing >> this at the same time, against multiple blkg's, then we can either lose >> references to a blkg, or put it twice. The latter causes complaints on >> the percpu ref being <= 0 when released, and can cause use-after-free as >> well. Ming reports that xfstest generic/475 triggers this: >> >> ------------[ cut here ]------------ >> percpu ref (blkg_release) <= 0 (0) after switching to atomic >> WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0 >> >> Switch to just using an on-stack bio for this, and get rid of the >> embedded bio. >> >> Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") >> Reported-by: Ming Lei <ming.lei@xxxxxxxxxx> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> >> --- >> >> Only other use case like this I could find is raid5-cache, which >> serializes access to its embedded bio (and actually submits it, too). >> That one looks fine. > > With this patch, xfstests(generic/475) won't trigger the warning of > 'percpu ref (blkg_release) <= 0 (0) after switching to atomic' any more. > > Tested-by: Ming Lei <ming.lei@xxxxxxxxxx> Great, thanks Ming. I think this was somewhat of a ticking time bomb, so good to have it fixed even if we could have done without a bug. And thanks for all the testing! -- Jens Axboe