Re: [PATCH 1/2] Fix userfaultfd_api to return EINVAL as expected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 08, 2024 at 10:11:28AM -0400, Peter Xu wrote:
> On Wed, May 08, 2024 at 08:58:42AM -0400, Audra Mitchell wrote:
> > On Wed, May 08, 2024 at 09:39:10AM +0200, David Hildenbrand wrote:
> > > On 07.05.24 21:55, Audra Mitchell wrote:
> > > > Currently if we request a feature that is not set in the Kernel
> > > > config we fail silently and return the available features. However, the
> > > > documentation indicates we should return an EINVAL.
> > > 
> > > I assume you are referencing
> > > 
> > > "EINVAL The API version requested in the api field is not supported by this
> > > kernel, or  the  features  field passed to the kernel includes feature bits
> > > that are not supported by the current kernel version."
> > > 
> > > and
> > > 
> > > "To  enable  userfaultfd features the application should set a bit
> > > corresponding to each feature it wants to enable in the features field. If
> > > the kernel supports all the requested features it will enable them.
> > > Otherwise it will zero out the returned uffdio_api structure and return
> > > EINVAL.
> > > "
> > > 
> > > in which case I agree.
> > 
> > Yep! I'm referencing the man page.
> > 
> > > 
> > > > 
> > > > We need to fix this issue since we can end up with a Kernel warning
> > > > should a program request the feature UFFD_FEATURE_WP_UNPOPULATED on
> > > > a kernel with the config not set with this feature.
> > > 
> > > Can you mention which exact one? Is it a WARN* or a pr_warn() ?
> > 
> > Here is the kernel warning I get:
> > 
> > [  200.803094] unrecognized swap entry 0x7c00000000000001
> > [  200.808270] ------------[ cut here ]------------
> > [  200.812896] WARNING: CPU: 91 PID: 13634 at mm/memory.c:1660 zap_pte_range+0x43d/0x660
> > [  200.820738] Modules linked in: qrtr bridge stp llc rfkill sunrpc amd_atl intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm ipmi_ssif acpi_ipmi i2c_piix4 ipmi_si wmi_bmof dcdbas dell_smbios dell_wmi_descriptor ptdma ipmi_devintf rapl ipmi_msghandler acpi_power_meter pcspkr k10temp xfs libcrc32c sd_mod t10_pi mgag200 sg drm_kms_helper crct10dif_pclmul i2c_algo_bit ahci crc32_pclmul drm_shmem_helper libahci crc32c_intel drm i40e libata ghash_clmulni_intel tg3 ccp megaraid_sas sp5100_tco wmi dm_mirror dm_region_hash dm_log dm_mod fuse
> > [  200.869387] CPU: 91 PID: 13634 Comm: userfaultfd Kdump: loaded Not tainted 6.9.0-rc5+ #8
> > [  200.877477] Hardware name: Dell Inc. PowerEdge R6525/0N7YGH, BIOS 2.7.3 03/30/2022
> > [  200.885052] RIP: 0010:zap_pte_range+0x43d/0x660
> > [  200.889595] Code: 83 fa 02 0f 86 44 01 00 00 83 f9 17 0f 84 e1 00 00 00 83 f9 1f 0f 84 d0 00 00 00 48 89 c6 48 c7 c7 00 e4 dd bb e8 73 a2 de ff <0f> 0b e9 44 fd ff ff 45 0f b6 44 24 20 41 f6 c0 f4 75 27 4c 89 ee
> > [  200.908348] RSP: 0018:ffffa18d2e6c37c8 EFLAGS: 00010246
> > [  200.913584] RAX: 000000000000002a RBX: 00007f26d3600000 RCX: 0000000000000000
> > [  200.920730] RDX: 0000000000000000 RSI: ffff93503f9a0bc0 RDI: ffff93503f9a0bc0
> > [  200.927867] RBP: 00007f26d35cc000 R08: 0000000000000000 R09: ffffa18d2e6c3688
> > [  200.935009] R10: ffffa18d2e6c3680 R11: ffffffffbc9de448 R12: ffffa18d2e6c39e8
> > [  200.942149] R13: ffff92d1ebc15b50 R14: ffff93114e0cde60 R15: ffffa18d2e6c3928
> > [  200.949291] FS:  0000000000000000(0000) GS:ffff93503f980000(0000) knlGS:0000000000000000
> > [  200.957384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  200.963140] CR2: 00007f26b1600658 CR3: 00000040905ba000 CR4: 0000000000350ef0
> > [  200.970283] Call Trace:
> > [  200.972745]  <TASK>
> > [  200.974862]  ? __warn+0x7f/0x130
> > [  200.978108]  ? zap_pte_range+0x43d/0x660
> > [  200.982044]  ? report_bug+0x18a/0x1a0
> > [  200.985720]  ? handle_bug+0x3c/0x70
> > [  200.989219]  ? exc_invalid_op+0x14/0x70
> > [  200.993068]  ? asm_exc_invalid_op+0x16/0x20
> > [  200.997265]  ? zap_pte_range+0x43d/0x660
> > [  201.001199]  ? zap_pte_range+0x43d/0x660
> > [  201.005134]  zap_pmd_range.isra.0+0xf9/0x230
> > [  201.009416]  unmap_page_range+0x2d4/0x4a0
> > [  201.013436]  unmap_vmas+0xa8/0x180
> > [  201.016854]  exit_mmap+0xea/0x3b0
> > [  201.020191]  __mmput+0x43/0x120
> > [  201.023342]  exit_mm+0xb1/0x110
> > [  201.026496]  do_exit+0x270/0x4f0
> > [  201.029739]  do_group_exit+0x2c/0x80
> > [  201.033326]  get_signal+0x886/0x8b0
> > [  201.036828]  ? srso_return_thunk+0x5/0x5f
> > [  201.040848]  arch_do_signal_or_restart+0x25/0x100
> > [  201.045563]  ? srso_return_thunk+0x5/0x5f
> > [  201.049583]  ? vma_set_page_prot+0x5e/0xc0
> > [  201.053692]  ? srso_return_thunk+0x5/0x5f
> > [  201.057713]  ? syscall_exit_work+0xff/0x130
> > [  201.061908]  syscall_exit_to_user_mode+0x1b3/0x200
> > [  201.066712]  do_syscall_64+0x87/0x160
> > [  201.070387]  ? srso_return_thunk+0x5/0x5f
> > [  201.074405]  ? do_mmap+0x416/0x5f0
> > [  201.077821]  ? srso_return_thunk+0x5/0x5f
> > [  201.081840]  ? rseq_get_rseq_cs+0x1d/0x240
> > [  201.085950]  ? srso_return_thunk+0x5/0x5f
> > [  201.089970]  ? rseq_ip_fixup+0x6d/0x1d0
> > [  201.093823]  ? vm_mmap_pgoff+0x117/0x1a0
> > [  201.097755]  ? srso_return_thunk+0x5/0x5f
> > [  201.101776]  ? srso_return_thunk+0x5/0x5f
> > [  201.105795]  ? syscall_exit_to_user_mode+0x78/0x200
> > [  201.110685]  ? srso_return_thunk+0x5/0x5f
> > [  201.114706]  ? do_syscall_64+0x87/0x160
> > [  201.118557]  ? srso_return_thunk+0x5/0x5f
> > [  201.122575]  ? __count_memcg_events+0x49/0xb0
> > [  201.126944]  ? srso_return_thunk+0x5/0x5f
> > [  201.130967]  ? srso_return_thunk+0x5/0x5f
> > [  201.134986]  ? syscall_exit_work+0xff/0x130
> > [  201.139184]  ? srso_return_thunk+0x5/0x5f
> > [  201.143205]  ? syscall_exit_to_user_mode+0x78/0x200
> > [  201.148093]  ? srso_return_thunk+0x5/0x5f
> > [  201.152114]  ? do_syscall_64+0x87/0x160
> > [  201.155960]  ? srso_return_thunk+0x5/0x5f
> > [  201.159984]  ? sched_clock_cpu+0xb/0x190
> > [  201.163916]  ? srso_return_thunk+0x5/0x5f
> > [  201.167939]  ? irqtime_account_irq+0x40/0xc0
> > [  201.172220]  ? srso_return_thunk+0x5/0x5f
> > [  201.176243]  ? srso_return_thunk+0x5/0x5f
> > [  201.180263]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  201.185326] RIP: 0033:0x7f26dfd0735b
> > [  201.188939] Code: Unable to access opcode bytes at 0x7f26dfd07331.
> > [  201.195128] RSP: 002b:00007fffce176868 EFLAGS: 00000206 ORIG_RAX: 000000000000000a
> > [  201.202700] RAX: fffffffffffffffc RBX: 00007f26dfe60000 RCX: 00007f26dfd0735b
> > [  201.209841] RDX: 0000000000000003 RSI: 0000000001000000 RDI: 00007f26af401000
> > [  201.216983] RBP: 00007f26b0400640 R08: 00000000ffffffff R09: 0000000000000000
> > [  201.224127] R10: ffffffffffffffc0 R11: 0000000000000206 R12: 0000000000000000
> > [  201.231267] R13: 000000000040d320 R14: 0000000000000000 R15: 0000000000000000
> > [  201.238413]  </TASK>
> > [  201.240610] ---[ end trace 0000000000000000 ]---
> > [  201.245250] unrecognized swap entry 0x7c00000000000001
> > 
> > 
> > 
> > > 
> > > Likely we want "Fixes:" here.
> > 
> > This could be seen as a continuation of the problem 
> > 2ff559f31a5d Revert "userfaultfd: don't fail on unrecognized features" 
> > was trying to solve. However, this patch only checks to make sure we didnt 
> > ask for a feature outside the possible range of features. We are still missing
> > a check to confirm the requested features are also configured on. So I guess 
> > the "Fixes" tag would be for this patch?
> > 914eedcb9ba0 userfaultfd: don't fail on unrecognized features
> 
> Even though 914eedcb9ba0 was problematic but it might be innocent in this
> regard, as we miss such check even before that commit.
> 
> Perhaps the right Fixes should be when there will be flags that will
> only be supported conditionally, which should be the wp support.  In that
> case, it could be:
> 
> Fixes: e06f1e1dd499 ("userfaultfd: wp: enabled write protection in userfaultfd API")
> 
> But it's still complicated though even if we want to backport, as this
> specific check for userfault was added later... in:
> 
> 00b151f21f39 ("mm/userfaultfd: fail uffd-wp registration if not supported")
> 
> In summary: I think we can stick with Fixes on e06f1e1dd499, but we don't
> copy stable.  The major reason we don't copy stable here is not only about
> complexity of such backport, but also that there can be apps trying to pass
> in unsupported bits (even if the kernel didn't support it) but keep using
> MISSING mode only, then we shouldn't fail them easily after a stable
> upgrade.  Just smells dangerous to backport.
> 
> When at it and repost, would you mind consolidate the two feature list
> check together?  We had the previous check here:
> 
> 	if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
> 		goto err_out;
> 
> If with this patch IIUC this previous check can be removed (still need to
> check UFFD_API).
> 
> Thanks,
> 
> > 
> > Happy to get your input here!
> > 
> > Thanks in advance!
> > 
> > 
> > > 
> > > > 
> > > > Signed-off-by: Audra Mitchell <audra@xxxxxxxxxx>
> > > > ---
> > > >   fs/userfaultfd.c | 5 +++++
> > > >   1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > > index 60dcfafdc11a..17210558de79 100644
> > > > --- a/fs/userfaultfd.c
> > > > +++ b/fs/userfaultfd.c
> > > > @@ -2073,6 +2073,11 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
> > > >   	uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
> > > >   	uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
> > > >   #endif
> > > > +
> > > > +	ret = -EINVAL;
> > > > +	if (features & ~uffdio_api.features)
> > > > +		goto err_out;
> > > > +
> > > >   	uffdio_api.ioctls = UFFD_API_IOCTLS;
> > > >   	ret = -EFAULT;
> > > >   	if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
> > > 
> > > CCing Peter.
> > > 
> > > -- 
> > > Cheers,
> > > 
> > > David / dhildenb
> > > 
> > 
> 
> -- 
> Peter Xu

I didn't forget about this, but I did find another test case that is going to
break with this change. Some other things landed on my plate/schedule so I'll
be returning to this in a week or two. I'll make sure I take your feedback
when I submit v2. Thanks again!

> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux