Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 24, 2022 at 11:39 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Thu, Feb 24, 2022 at 10:41 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
> >
> > Hi Amir-
> >
> > > On Feb 24, 2022, at 11:17 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > >
> > > The nfsd file cache table can be pretty large and its allocation
> > > may require as many as 80 contigious pages.
> > >
> > > Employ the same fix that was employed for similar issue that was
> > > reported for the reply cache hash table allocation several years ago
> > > by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> > > in nfsd_reply_cache_init").
> > >
> > > Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> > > Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@xxxxxxxxxx/
> > > Suggested-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > ---
> > >
> > > Since v1:
> > > - Use kvcalloc()
> > > - Use kvfree()
> > >
> > > fs/nfsd/filecache.c | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > v2 passes some simple testing, so I've applied it to NFSD for-next.
> > It should get 0-day and merge testing and is available for others
> > to try out.
> >
> > I don't have anything that exercises low memory scenarios, though.
> > Do you have anything like this to try?
>
> Well, it is not low memory really it's fragmented memory.
> I would try setting:
>
> CONFIG_FAIL_PAGE_ALLOC=y
>
> echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
> echo 100 > /sys/kernel/debug/fail_page_alloc/probability
>
> and starting (or restarting) nfsd.
> hoping that other large page allocations won't get in the way.
>
> I gave it a shot, but couldn't figure out why nfsd4_files slab
> is still there after stopping nfs-server service, meaning that
> nfsd_file_cache_shutdown() was not called - I must be missing
> something. I may play with this some more tomorrow.
>

Ok, I was missing some parameters.
This configuration reproduces and failure and verified that the
kvcalloc() fix solves the issue:

$ systemctl stop nfs-server
$ echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
$ echo 100 > /sys/kernel/debug/fail_page_alloc/probability
$ echo 1 > /sys/kernel/debug/fail_page_alloc/times
$ echo N > /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait
$ systemctl start nfs-server

[   24.410560] FAULT_INJECTION: forcing a failure.
[   24.410560] name fail_page_alloc, interval 1, probability 100,
space 0, times 1
[   24.413887] CPU: 1 PID: 1218 Comm: rpc.nfsd Not tainted
5.17.0-rc2-xfstests #5927
[   24.415625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   24.417098] Call Trace:
[   24.417098]  <TASK>
[   24.417098]  dump_stack_lvl+0x45/0x59
[   24.418999]  should_fail+0x11a/0x13d
[   24.418999]  prepare_alloc_pages.isra.0+0x97/0xc5
[   24.418999]  __alloc_pages+0x76/0x1c7
[   24.418999]  kmalloc_order+0x35/0xa7
[   24.418999]  kmalloc_order_trace+0x1b/0xf3
[   24.418999]  nfsd_file_cache_init+0x5b/0x2d8
[   24.418999]  nfsd_svc+0xcd/0x2b2
[   24.427086]  write_threads+0x6d/0xb5
[   24.427086]  ? get_int+0x70/0x70
[   24.429020]  nfsctl_transaction_write+0x4f/0x67
[   24.429020]  vfs_write+0xe3/0x14b
[   24.429020]  ksys_write+0x7f/0xcb
[   24.429020]  do_syscall_64+0x6d/0x80
[   24.429020]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   24.429020] RIP: 0033:0x7f29d80d6504
[   24.429020] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f
1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89
f5 53
[   24.439028] RSP: 002b:00007ffe867a47f8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   24.439028] RAX: ffffffffffffffda RBX: 00007f29d8219560 RCX: 00007f29d80d6504
[   24.442325] RDX: 0000000000000002 RSI: 00007f29d8219560 RDI: 0000000000000003
[   24.442325] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffe867a4557
[   24.445644] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[   24.445644] R13: 0000000000000008 R14: 0000000000000000 R15: 00007f29d83572a0
[   24.449026]  </TASK>
[   24.450496] nfsd: unable to allocate nfsd_file_hashtbl
Job for nfs-server.service canceled.

With the fix patch, nfsd starts despite the injected failure.

Thanks,
Amir.



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux