Re: [PATCH] virtiofs: limit the length of ITER_KVEC dio by max_nopage_rw

On 2/23/2024 5:42 PM, Miklos Szeredi wrote:
> On Wed, 3 Jan 2024 at 11:58, Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> From: Hou Tao <houtao1@xxxxxxxxxx>
>> When trying to insert a 10MB kernel module kept in a virtiofs with cache
>> disabled, the following warning was reported:
>>   ------------[ cut here ]------------
>>   WARNING: CPU: 2 PID: 439 at mm/page_alloc.c:4544 ......
>>   Modules linked in:
>>   CPU: 2 PID: 439 Comm: insmod Not tainted 6.7.0-rc7+ #33
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ......
>>   RIP: 0010:__alloc_pages+0x2c4/0x360
>>   ......
>>   Call Trace:
>>    <TASK>
>>    ? __warn+0x8f/0x150
>>    ? __alloc_pages+0x2c4/0x360
>>    __kmalloc_large_node+0x86/0x160
>>    __kmalloc+0xcd/0x140
>>    virtio_fs_enqueue_req+0x240/0x6d0
>>    virtio_fs_wake_pending_and_unlock+0x7f/0x190
>>    queue_request_and_unlock+0x58/0x70
>>    fuse_simple_request+0x18b/0x2e0
>>    fuse_direct_io+0x58a/0x850
>>    fuse_file_read_iter+0xdb/0x130
>>    __kernel_read+0xf3/0x260
>>    kernel_read+0x45/0x60
>>    kernel_read_file+0x1ad/0x2b0
>>    init_module_from_file+0x6a/0xe0
>>    idempotent_init_module+0x179/0x230
>>    __x64_sys_finit_module+0x5d/0xb0
>>    do_syscall_64+0x36/0xb0
>>    entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>    ......
>>    </TASK>
>>   ---[ end trace 0000000000000000 ]---
>> The warning happened as follow. In copy_args_to_argbuf(), virtiofs uses
>> kmalloc-ed memory as bound buffer for fuse args, but
> So this seems to be the special case in fuse_get_user_pages() when the
> read/write requests get a piece of kernel memory.
> I don't really understand the comment in virtio_fs_enqueue_req():  /*
> Use a bounce buffer since stack args cannot be mapped */
> Stefan, can you explain?  What's special about the arg being on the stack?
> What if the arg is not on the stack (as is probably the case for big
> args like this)?   Do we need the bounce buffer in that case?

I will try to answer these two questions. Correct me if I am wrong. The
main reason for the bounce buffer is that virtiofs passes a scatter list
to the virtiofsd through virtio eventually, so it needs to get the page
(namely struct page) for these args. If the arg is placed in the stack,
there is no way to get the page. For ITER_KVEC dio mentioned in the
patch, the data buffer is still allocated through vmalloc(), so the
bounce buffer is still necessary.

> Thanks,
> Miklos

