On Jun 29, 2018, at 1:36 PM, Jon Derrick <jonathan.derrick@xxxxxxxxx> wrote: > > This patch attempts to close a hole leading to a BUG seen with hot > removals during writes [1]. > > A block device (NVME namespace in this test case) is formatted to EXT4 > without partitions. It's mounted and write I/O is run to a file, then > the device is hot removed from the slot. The superblock attempts to be > written to the drive which is no longer present. > > The typical chain of events leading to the BUG: > ext4_commit_super() > __sync_dirty_buffer() > submit_bh() > submit_bh_wbc() > BUG_ON(!buffer_mapped(bh)); > > This fix checks for the superblock's buffer head being mapped prior to > syncing. > > [1] https://www.spinics.net/lists/linux-ext4/msg56527.html > > Signed-off-by: Jon Derrick <jonathan.derrick@xxxxxxxxx> > --- > fs/ext4/super.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 0c4c220..ee33233 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -4736,6 +4736,14 @@ static int ext4_commit_super(struct super_block *sb, int sync) > > if (!sbh || block_device_ejected(sb)) > return error; > + > + /* > + * The superblock bh should be mapped, but it might not be if the > + * device was hot-removed. Not much we can do but fail the I/O. > + */ > + if (!buffer_mapped(sbh)) > + return error; This still looks a bit racy, based on the stack trace you posted. There is already a "block_device_ejected()" check a line above, which makes me think that the PCI device removal should be handled like an ejected device, so that it is also handled elsewhere. Even so, the check here in ext4_commit_super() can pass, and the PCI card can be removed on the next instruction and still trigger the BUG_ON(). That said, this is probably still an improvement on the existing situation. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP