On Mon, Mar 09, 2020 at 11:04:52AM -0700, Eric Biggers wrote: > On Mon, Mar 09, 2020 at 09:24:39AM -0700, Darrick J. Wong wrote: > > On Sun, Mar 08, 2020 at 09:34:30PM -0700, Eric Biggers wrote: > > > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > > > > > Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set > > > during do_exit(). That can confuse things. For example, if BSD process > > > accounting is enabled and the accounting file has FS_SYNC_FL set and is > > > located on an ext4 filesystem without a journal, then do_exit() ends up > > > calling ext4_write_inode(). That triggers the > > > WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes > > > (appropriately) that inodes aren't written when allocating memory. > > > > > > Fix this in xfsaild() by using the helper functions to save and restore > > > PF_MEMALLOC. > > > > > > This can be reproduced as follows in the kvm-xfstests test appliance > > > modified to add the 'acct' Debian package, and with kvm-xfstests's > > > recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y: > > > > > > mkfs.ext2 -F /dev/vdb > > > mount /vdb -t ext4 > > > touch /vdb/file > > > chattr +S /vdb/file > > > > Does this trip if the process accounting file is also on an xfs > > filesystem? > > > > > accton /vdb/file > > > mkfs.xfs -f /dev/vdc > > > mount /vdc > > > umount /vdc > > > > ...and if so, can this be turned into an fstests case, please? > > I wasn't expecting it, but it turns out it does actually trip a similar warning > in iomap_do_writepage(): > > mkfs.xfs -f /dev/vdb > mount /vdb > touch /vdb/file > chattr +S /vdb/file > accton /vdb/file > mkfs.xfs -f /dev/vdc > mount /vdc > umount /vdc > > causes... > > WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534 > CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014 > RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534 > [...] > Call Trace: > write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238 > iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642 > xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578 > do_writepages+0x41/0xe0 mm/page-writeback.c:2344 > __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421 > file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760 > xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114 > generic_write_sync include/linux/fs.h:2867 [inline] > xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691 > call_write_iter include/linux/fs.h:1901 [inline] > new_sync_write+0x130/0x1d0 fs/read_write.c:483 > __kernel_write+0x54/0xe0 fs/read_write.c:515 > do_acct_process+0x122/0x170 kernel/acct.c:522 > slow_acct_process kernel/acct.c:581 [inline] > acct_process+0x1d4/0x27c kernel/acct.c:607 > do_exit+0x83d/0xbc0 kernel/exit.c:791 > kthread+0xf1/0x140 kernel/kthread.c:257 > ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352 > > So sure, since it's not necessarily a multi-filesystem thing, I can try to turn > it into an xfstest. There's currently no way to enable BSD process accounting > in xfstests though, so we'll either need to make the test depend on the 'acct' > program or add a helper test program. > > Also, do you want me to update the commit message again, to mention the above > case? I think it's worth mentioning that this is a general problem that applies any time the process accounting file has that sync flag set, since this problem isn't specific to ext4 + xfs. (and now I wonder how many other places in the kernel suffer from these kinds of file write surprises...) --D > - Eric