On Wed, Jul 18, 2018 at 09:56:50PM +0200, hch@xxxxxx wrote: > On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote: > > Of the two you mentioned, yours is preferable IMO. While I appreciate > > Jianchao's detailed analysis, it's hard to take a proposal seriously > > that so colourfully calls everyone else "dangerous" while advocating > > for silently losing requests on purpose. > > > > But where's the option that fixes scsi to handle hardware completions > > concurrently with arbitrary timeout software? Propping up that house of > > cards can't be the only recourse. > > The important bit is that we need to fix this issue quickly. We are > past -rc5 so I'm rather concerned about anything too complicated. > > I'm not even sure SCSI has a problem with multiple completions happening > at the same time, but it certainly has a problem with bypassing > blk_mq_complete_request from the EH path. > > I think we can solve this properly, but I also think we are way to late > in the 4.18 cycle to fix it properly. For now I fear we'll just have > to revert the changes and try again for 4.19 or even 4.20 if we don't > act quickly enough. So here is a quick attempt at the revert while also trying to keep nvme working. Keith, Bart, Jianchao - does this looks reasonable as a 4.18 band aid? http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-eh-revert