Re: [PATCH 0/4 RFC] BDI lifetime fix

Omar Sandoval <osandov@xxxxxxxxxxx> · Fri, 27 Jan 2017 11:49:19 -0800

On Thu, Jan 26, 2017 at 10:15:06PM -0800, Dan Williams wrote:
> On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara <jack@xxxxxxx> wrote:
> > Hello,
> >
> > this patch series attempts to solve the problems with the life time of a
> > backing_dev_info structure. Currently it lives inside request_queue structure
> > and thus it gets destroyed as soon as request queue goes away. However
> > the block device inode still stays around and thus inode_to_bdi() call on
> > that inode (e.g. from flusher worker) may happen after request queue has been
> > destroyed resulting in oops.
> >
> > This patch set tries to solve these problems by making backing_dev_info
> > independent structure referenced from block device inode. That makes sure
> > inode_to_bdi() cannot ever oops. The patches are lightly tested for now
> > (they boot, basic tests with adding & removing loop devices seem to do what
> > I'd expect them to do ;). If someone is able to reproduce crashes on bdi
> > when device goes away, please test these patches.
> 
> This survives a several runs of the libnvdimm unit tests which stress
> del_gendisk() and blk_cleanup_queue(). I'll keep testing since the
> failure was intermittent, but this is looking good.
> 
> > I'd also appreciate if people had a look whether the approach I took looks
> > sensible.
> 
> Looks sensible, just the kref comment.
> 
> I also don't see a need to try to tag on the bdi device name reuse
> into this series. I'm wondering if we can handle that separately with
> device_rename(bdi->dev, ...) when we know scsi is done with the old
> bdi but it has not finished being deleted

What's the status of the device name issue? We're hitting it a lot here.
It's really easy to reproduce with scsi_debug, script attached. I'd be
happy to test out any patches.
Attachment:
stress_test_scsi_debug.sh

Description: Bourne shell script