AW: AW: [PATCH] md/bcache: Fix a deadlock while calculating writeback rate

Phillipp Röll <phillipp.roell@xxxxxxxxxxxxxx> · Thu, 18 May 2017 18:52:26 +0000

> -----Ursprüngliche Nachricht-----
> Von: Eric Wheeler [mailto:bcache@xxxxxxxxxxxxxxxxxx]
> Gesendet: Donnerstag, 18. Mai 2017 19:05
> An: Phillipp Röll
> Cc: linux-bcache@xxxxxxxxxxxxxxx
> Betreff: Re: AW: [PATCH] md/bcache: Fix a deadlock while calculating
> writeback rate
> 
> On Fri, 12 May 2017, Eric Wheeler wrote:
> 
> > On Wed, 10 May 2017, Phillipp Röll wrote:
> >
> > > Hi Eric,
> > >
> > > > Thank you for tracking this down!  Did you have an actual deadlock
> > > > case that this fixed?
> > >
> > > Yes. We had at least dozens, if not hundreds, of "kernel hung task"
> > > events followed by complete IO lockups on our production systems,
> > > with traces like
> > > this: https://paste42.de/059f78f7f9f27e9d88cd65fc02811e92/11749/
> > >
> > > Reproduction is actually pretty easy:
> > >
> > > - Setup bcache with SSD + HDD and LVM on top of the resulting bcache
> > > device
> > > - Create three LVs in the new cached VG
> > > - Start dd'ing from one of these LVs to another LV
> > > - Start fio with randrw on the third LV
> > > - Wait 5-10 minutes for the message to pop up and the system to lock
> > > up
> > >
> > > > I wonder: in the event of a race, can the updated P-factor be
> > > > invalid (ie, partially populated)? I don't believe 64-bit
> > > > assignments are atomic on all architectures (see
> > > > __update_writeback_rate) so you might get the upper or lower 32
> > > > bits of the P-factor or of other values assigned therein.  Could
> > > > the rate be calculated in such a race by the writeback thread such that
> no progress is made (eg, effectively rate==0)?
> > >
> > > That seems to be right, yes. I think I can add that spinlock. Would
> > > you say the spinlock should be used for the sysfs-info as well?
> >
> > Yes, anywhere it is used.  It might be a good idea to write a short
> > accessor function if there are lots of places that need to grab the lock.
> > Some kind of future-proofing would be a good idea to make it easy if
> > anyone else grabs the value and doesn't realize it needs to be locked.
> 
> Hi Phillipp,
> 
> Just checking in:
> 
> Will you have a spinlock version of the patch available soon?
> 
> -Eric
> 
> ps, be sure to Cc: linux-bcache@xxxxxxxxxxxxxxx on your replies :)
> 

Hi Eric,
honestly I don't know. I haven't touched C in a decade and would have
to read up on documentation first, setup the dev ecosystem with
build and testing and I have no clue when I can spare a day or two
for that. Removing a lock is one thing, adding one is a different beast.

Plus, for our arch the problem is solved, it's just not upstream,
so the priority has gone from "critical" to "none". I wouldn't be
surprised if I couldn't work on this until after the summer, sorry.

- Phillipp

> >
> > --
> > Eric Wheeler
> >
> >
> >
> > >
> > > We're using amd64 only and had the patch tested with IO stress tests
> > > for
> > > 10 to 14 days and running in prod for 3 months now without issue.
> > > But that doesn't mean anything for the other architectures...
> > >
> > > Phillipp
> > >
> > > > > Signed-off-by: Phillipp Röll <phillipp.roell@xxxxxxxxxxxxxx>
> > > > > ---
> > > > >  drivers/md/bcache/writeback.c | 4 ----
> > > > >  1 file changed, 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/md/bcache/writeback.c
> > > > b/drivers/md/bcache/writeback.c
> > > > > index 6ac2e48..90da3bd 100644
> > > > > --- a/drivers/md/bcache/writeback.c
> > > > > +++ b/drivers/md/bcache/writeback.c
> > > > > @@ -75,14 +75,10 @@ static void update_writeback_rate(struct
> > > > work_struct *work)
> > > > >  					     struct cached_dev,
> > > > >  					     writeback_rate_update);
> > > > >
> > > > > -	down_read(&dc->writeback_lock);
> > > > > -
> > > > >  	if (atomic_read(&dc->has_dirty) &&
> > > > >  	    dc->writeback_percent)
> > > > >  		__update_writeback_rate(dc);
> > > > >
> > > > > -	up_read(&dc->writeback_lock);
> > > > > -
> > > > >  	schedule_delayed_work(&dc->writeback_rate_update,
> > > > >  			      dc->writeback_rate_update_seconds *
> HZ);  }
> > > > > --
> > > > > 2.10.0
> > > > >
> > > > >
> > > > >
> > >
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html