> -----Original Message----- > From: Christoph Hellwig <hch@xxxxxxxxxxxxx> > Sent: Wednesday, September 23, 2020 10:13 PM > To: Keith Busch <kbusch@xxxxxxxxxx> > Cc: Meng Wang <meng@xxxxxxxxxxxxx>; linux-ext4@xxxxxxxxxxxxxxx; linux- > nvme@xxxxxxxxxxxxxxxxxxx > Subject: Re: kernel panics when hot removing U.2 nvme disk > > On Fri, Sep 18, 2020 at 06:44:01PM -0700, Keith Busch wrote: > > On Fri, Sep 18, 2020 at 11:47:27PM +0000, Meng Wang wrote: > > > Hi, > > > We found kernel panics today when doing test on hot remove U.2 nvme > > > disk. After hot remove the nvme disk (formatted as ext4), the system > > > freezes and all services stuck. Lot of kernel message flushed the > > > syslog, including the CPU soft lockup, ext4 NULL point er dereferece > > > and ib nic transmission timeout. The kernel panics and configuration > > > are shown below. The used kernel is 5.4.0-050400-generic and OS is > > > Ubuntu 16.04. Not sure whether it's a known bug or configuration > > > error. Any advise are welcome. > > > > [cc'ing ext4 mailing list] > > > > The NULL dereference occured before the soft lockup, so I'm guessing the > > Oops'ed process is holding the same lock the removal task wants. > > > > Your kernel is a bit older, so it may be worth verifying if your > > observation still occurs on the current stable or current mainline, but > > the ext4 developers may have a better idea as this doesn't at least > > initially appear specific to nvme. > > The problem is the crazy __invalidate_device stuff that calls into > file system eviction from all kinds of super critical block paths. > While I haven't debugged the root cause this kind of thing just causes > problems without really helping anyone. I have a half-finished series > that kills this crap and instead allows the file system (or other > block device user) to pass shutdown and resize callbacks when the > exclusively open a block device. That way the file system driver > can just mark the file system shutdown to prevent any further damage > without all this mess. Thanks for the info. Is it a problem solely for ext4 + nvme combination? If we change file system or use SATA drive, will the problem get workaround?