Re: [PATCH V4 0/2] blk-mq: fix race between completion and BLK_EH_RESET_TIMER

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ming Lei - 16.04.18, 02:45:
> On Sun, Apr 15, 2018 at 06:31:44PM +0200, Martin Steigerwald wrote:
> > Hi Ming.
> > 
> > Ming Lei - 15.04.18, 17:43:
> > > Hi Jens,
> > > 
> > > This two patches fixes the recently discussed race between
> > > completion
> > > and BLK_EH_RESET_TIMER.
> > > 
> > > Israel & Martin, this one is a simpler fix on this issue and can
> > > cover the potencial hang of MQ_RQ_COMPLETE_IN_TIMEOUT request,
> > > could
> > > you test V4 and see if your issue can be fixed?
> > 
> > In replacement of all the three other patches I applied?
> > 
> > - '[PATCH] blk-mq_Directly schedule q->timeout_work when aborting a
> > request.mbox'
> > 
> > - '[PATCH v2] block: Change a rcu_read_{lock,unlock}_sched() pair
> > into rcu_read_{lock,unlock}().mbox'
> > 
> > - '[PATCH v4] blk-mq_Fix race conditions in request timeout
> > handling.mbox'
> 
> You only need to replace the above one '[PATCH v4] blk-mq_Fix race
> conditions in request timeout' with V4 in this thread.

Ming, a 4.16.2 with the patches:

'[PATCH] blk-mq_Directly schedule q->timeout_work when aborting a 
request.mbox'
'[PATCH v2] block: Change a rcu_read_{lock,unlock}_sched() pair into 
rcu_read_{lock,unlock}().mbox'
'[PATCH V4 1_2] blk-mq_set RQF_MQ_TIMEOUT_EXPIRED when the rq'\''s 
timeout isn'\''t handled.mbox'
'[PATCH V4 2_2] blk-mq_fix race between complete and 
BLK_EH_RESET_TIMER.mbox'

hung on boot 3 out of 4 times.

See

[Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime 
and boot failures with blk_mq_terminate_expired in backtrace
https://bugzilla.kernel.org/show_bug.cgi?id=199077#c13

I tried to add your mail address to Cc of the bug report, but Bugzilla 
did not know it.

Fortunately it booted on the fourth attempt, cause I forgot my GRUB 
password.

Reverting back to previous 4.16.1 kernel with patches from Bart.

> > These patches worked reliably so far both for the hang on boot and
> > error reading SMART data.
> 
> And you may see the reason in the following thread:
> 
> https://marc.info/?l=linux-block&m=152366441625786&w=2

So requests could never be completed?

> > I´d compile a kernel tomorrow or Tuesday I think.

Thanks,
-- 
Martin





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux