Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Feb 26, 2022, at 1:37 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> 
> On Thu, Feb 24, 2022 at 11:39 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>> 
>> On Thu, Feb 24, 2022 at 10:41 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>>> 
>>> Hi Amir-
>>> 
>>>> On Feb 24, 2022, at 11:17 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>>>> 
>>>> The nfsd file cache table can be pretty large and its allocation
>>>> may require as many as 80 contigious pages.
>>>> 
>>>> Employ the same fix that was employed for similar issue that was
>>>> reported for the reply cache hash table allocation several years ago
>>>> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
>>>> in nfsd_reply_cache_init").
>>>> 
>>>> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
>>>> Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@xxxxxxxxxx/
>>>> Suggested-by: Jeff Layton <jlayton@xxxxxxxxxx>
>>>> Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
>>>> ---
>>>> 
>>>> Since v1:
>>>> - Use kvcalloc()
>>>> - Use kvfree()
>>>> 
>>>> fs/nfsd/filecache.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>> 
>>> v2 passes some simple testing, so I've applied it to NFSD for-next.
>>> It should get 0-day and merge testing and is available for others
>>> to try out.
>>> 
>>> I don't have anything that exercises low memory scenarios, though.
>>> Do you have anything like this to try?
>> 
>> Well, it is not low memory really it's fragmented memory.
>> I would try setting:
>> 
>> CONFIG_FAIL_PAGE_ALLOC=y
>> 
>> echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
>> echo 100 > /sys/kernel/debug/fail_page_alloc/probability
>> 
>> and starting (or restarting) nfsd.
>> hoping that other large page allocations won't get in the way.
>> 
>> I gave it a shot, but couldn't figure out why nfsd4_files slab
>> is still there after stopping nfs-server service, meaning that
>> nfsd_file_cache_shutdown() was not called - I must be missing
>> something. I may play with this some more tomorrow.
>> 
> 
> Ok, I was missing some parameters.
> This configuration reproduces and failure and verified that the
> kvcalloc() fix solves the issue:
> 
> $ systemctl stop nfs-server
> $ echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
> $ echo 100 > /sys/kernel/debug/fail_page_alloc/probability
> $ echo 1 > /sys/kernel/debug/fail_page_alloc/times
> $ echo N > /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait
> $ systemctl start nfs-server
> 
> [   24.410560] FAULT_INJECTION: forcing a failure.
> [   24.410560] name fail_page_alloc, interval 1, probability 100,
> space 0, times 1
> [   24.413887] CPU: 1 PID: 1218 Comm: rpc.nfsd Not tainted
> 5.17.0-rc2-xfstests #5927
> [   24.415625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [   24.417098] Call Trace:
> [   24.417098]  <TASK>
> [   24.417098]  dump_stack_lvl+0x45/0x59
> [   24.418999]  should_fail+0x11a/0x13d
> [   24.418999]  prepare_alloc_pages.isra.0+0x97/0xc5
> [   24.418999]  __alloc_pages+0x76/0x1c7
> [   24.418999]  kmalloc_order+0x35/0xa7
> [   24.418999]  kmalloc_order_trace+0x1b/0xf3
> [   24.418999]  nfsd_file_cache_init+0x5b/0x2d8
> [   24.418999]  nfsd_svc+0xcd/0x2b2
> [   24.427086]  write_threads+0x6d/0xb5
> [   24.427086]  ? get_int+0x70/0x70
> [   24.429020]  nfsctl_transaction_write+0x4f/0x67
> [   24.429020]  vfs_write+0xe3/0x14b
> [   24.429020]  ksys_write+0x7f/0xcb
> [   24.429020]  do_syscall_64+0x6d/0x80
> [   24.429020]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   24.429020] RIP: 0033:0x7f29d80d6504
> [   24.429020] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f
> 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89
> f5 53
> [   24.439028] RSP: 002b:00007ffe867a47f8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [   24.439028] RAX: ffffffffffffffda RBX: 00007f29d8219560 RCX: 00007f29d80d6504
> [   24.442325] RDX: 0000000000000002 RSI: 00007f29d8219560 RDI: 0000000000000003
> [   24.442325] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffe867a4557
> [   24.445644] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [   24.445644] R13: 0000000000000008 R14: 0000000000000000 R15: 00007f29d83572a0
> [   24.449026]  </TASK>
> [   24.450496] nfsd: unable to allocate nfsd_file_hashtbl
> Job for nfs-server.service canceled.
> 
> With the fix patch, nfsd starts despite the injected failure.

Thanks. I will add your Tested-by: .


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux