Re: [PATCH V2 0/4] nvme: fix two kinds of IO hang from removing NSs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 21, 2023 at 09:27:31PM +0800, Ming Lei wrote:
> On Wed, Jun 21, 2023 at 01:13:05PM +0300, Sagi Grimberg wrote:
> > 
> > > > > > > Hello,
> > > > > > > 
> > > > > > > The 1st three patch fixes io hang when controller removal interrupts error
> > > > > > > recovery, then queue is left as frozen.
> > > > > > > 
> > > > > > > The 4th patch fixes io hang when controller is left as unquiesce.
> > > > > > 
> > > > > > Ming, what happened to nvme-tcp/rdma move of freeze/unfreeze to the
> > > > > > connect patches?
> > > > > 
> > > > > I'd suggest to handle all drivers(include nvme-pci) in same logic for avoiding
> > > > > extra maintain burden wrt. error handling, but looks Keith worries about the
> > > > > delay freezing may cause too many requests queued during error handling, and
> > > > > that might cause user report.
> > > > 
> > > > For nvme-tcp/rdma your patch also addresses IO not failing over because
> > > > they block on queue enter. So I definitely want this for fabrics.
> > > 
> > > The patch in the following link should fix these issues too:
> > > 
> > > https://lore.kernel.org/linux-block/ZJGmW7lEaipT6saa@xxxxxxxxxxxxxxxxxxxxxxxxx/T/#u
> > > 
> > > I guess you still want the paired freeze patch because it makes freeze &
> > > unfreeze more reliable in error handling. If yes, I can make one fabric
> > > only change for you.
> > 
> > Not sure exactly what reliability is referred here.
> 
> freeze and unfreeze have to be called strictly in pair, but nvme calls
> the two from different contexts, so unfreeze may easily be missed, and
> causes mismatched freeze count. There has many such reports so far.
> 
> > I agree that there
> > is an issue with controller delete during error recovery. The patch
> > was a way to side-step it, great. But it addressed I/O blocked on enter
> > and not failing over.
> > 
> > So yes, for fabrics we should have it. I would argue that it would be
> > the right thing to do for pci as well. But I won't argue if Keith feels
> > otherwise.
> 
> Keith, can you update with us if you are fine with moving
> nvme_start_freeze() into nvme_reset_work() for nvme pci driver?

The point was to contain requests from entering while the hctx's are
being reconfigured. If you're going to pair up the freezes as you've
suggested, we might as well just not call freeze at all.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux