Hi Sakari, On 14/03/2023 11:59, Sakari Ailus wrote: > Hi Hans, > > On Tue, Mar 14, 2023 at 09:58:43AM +0100, Hans Verkuil wrote: >> On 14/03/2023 09:43, Sakari Ailus wrote: >>> Hi Hans, >>> >>> On Tue, Mar 14, 2023 at 09:30:52AM +0100, Hans Verkuil wrote: >>>> On 13/03/2023 17:53, Sakari Ailus wrote: >>>>> Hi Hans, >>>>> >>>>> On Mon, Mar 13, 2023 at 03:39:25PM +0100, Hans Verkuil wrote: >>>>>> On 13/03/2023 15:02, Sakari Ailus wrote: >>>>>>> Hi Hans, >>>>>>> >>>>>>> On Mon, Mar 13, 2023 at 02:46:27PM +0100, Hans Verkuil wrote: >>>>>>>> On 03/03/2023 12:08, Sakari Ailus wrote: >>>>>>>>> Hi Hans, >>>>>>>>> >>>>>>>>> On Fri, Mar 03, 2023 at 09:54:40AM +0100, Hans Verkuil wrote: >>>>>>>>>> On 03/03/2023 09:39, Hans Verkuil wrote: >>>>>>>>>>> On 01/02/2023 22:45, Sakari Ailus wrote: >>>>>>>>>>>> Add a new helper data structure media_devnode_compat_ref, which is used to >>>>>>>>>>>> prevent user space from calling IOCTLs or other system calls to the media >>>>>>>>>>>> device that has been already unregistered. >>>>>>>>>>>> >>>>>>>>>>>> The media device's memory may of course still be released during the call >>>>>>>>>>>> but there is only so much that can be done to this without the driver >>>>>>>>>>>> managing the lifetime of the resources it needs somehow. >>>>>>>>>>>> >>>>>>>>>>>> This patch should be reverted once all drivers have been converted to manage >>>>>>>>>>>> their resources' lifetime. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Sakari Ailus <sakari.ailus@xxxxxxxxxxxxxxx> >>>>>>>>>>>> --- >>>>>>>>>>>> drivers/media/mc/mc-device.c | 60 ++++++++++++++++++++++++++++++----- >>>>>>>>>>>> drivers/media/mc/mc-devnode.c | 21 ++++++++---- >>>>>>>>>>>> include/media/media-devnode.h | 29 +++++++++++++++++ >>>>>>>>>>>> 3 files changed, 96 insertions(+), 14 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/drivers/media/mc/mc-device.c b/drivers/media/mc/mc-device.c >>>>>>>>>>>> index 3a1db5fdbba7..22fdaa6370ea 100644 >>>>>>>>>>>> --- a/drivers/media/mc/mc-device.c >>>>>>>>>>>> +++ b/drivers/media/mc/mc-device.c >>>>>>>>>>>> @@ -45,18 +45,34 @@ static inline void __user *media_get_uptr(__u64 arg) >>>>>>>>>>>> return (void __user *)(uintptr_t)arg; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> +static void compat_ref_release(struct kref *kref) >>>>>>>>>>>> +{ >>>>>>>>>>>> + struct media_devnode_compat_ref *ref = >>>>>>>>>>>> + container_of_const(kref, struct media_devnode_compat_ref, kref); >>>>>>>>>>>> + >>>>>>>>>>>> + kfree(ref); >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> static int media_device_open(struct media_devnode *devnode, struct file *filp) >>>>>>>>>>>> { >>>>>>>>>>>> struct media_device *mdev = to_media_device(devnode); >>>>>>>>>>>> struct media_device_fh *fh; >>>>>>>>>>>> unsigned long flags; >>>>>>>>>>>> >>>>>>>>>>>> + if (devnode->ref && (!atomic_read(&devnode->ref->registered) || >>>>>>>>>>>> + !kref_get_unless_zero(&devnode->ref->kref))) >>>>>>>>>>>> + return -ENXIO; >>>>>>>>>>>> + >>>>>>>>>>> >>>>>>>>>>> This seems pointless: if the media device is unregistered, then the device >>>>>>>>>>> node disappears and it can't be opened anymore. >>>>>>>>>>> >>>>>>>>>>> I'm confused by this patch in general: when media_device_unregister() is called, >>>>>>>>>>> it is no longer possible to call ioctls and basically do anything except close >>>>>>>>>>> the open fh. >>>>>>>>>>> >>>>>>>>>>> So what am I missing here? It all looks odd. >>>>>>>>>> >>>>>>>>>> I read up on this a bit more, and I think this patch is bogus: drivers not >>>>>>>>>> converted to the release() callback will indeed just crash, but that's no >>>>>>>>>> different than many existing drivers, media or otherwise, when you forcibly >>>>>>>>>> unbind them. It's broken today, and since you have to be root to unbind, I >>>>>>>>>> would say that we can just leave it as-is rather than introducing a rather >>>>>>>>>> ugly workaround. I don't think it will help anyway, since most likely >>>>>>>>>> such drivers will also fails if the application has a video device open >>>>>>>>>> when the device is unbound. >>>>>>>>> >>>>>>>>> The main difference is whether accessing such a file handle will access >>>>>>>>> released memory always or whether that is possible only during a very brief >>>>>>>>> amount of time. >>>>>>>>> >>>>>>>> >>>>>>>> I still don't like this. It was broken before, and it is broken now (perhaps a >>>>>>>> bit less broken, but still...). >>>>>>>> >>>>>>>> There is a right fix now, and drivers that are likely to be removed forcibly >>>>>>>> should be converted. This patch just makes it more likely that such drivers >>>>>>>> won't be converted since it is less likely to hit this, so people will just >>>>>>>> think that this is 'good enough'. And it makes the code a lot uglier. >>>>>>> >>>>>>> I agree, although converting the drivers is easier said than done. Note >>>>>>> that also DVB is affected by this, not just V4L2. There are quite a few DVB >>>>>>> USB devices. >>>>>>> >>>>>>> The behaviour before this set (since ~ 2017) is restored by the last few >>>>>>> patches, without these we're on pre-2017 behaviour which basically means >>>>>>> that all media IOCTLs on file handles the device of which is gone, will >>>>>>> always access released memory. That was quite bad, too. >>>>>> >>>>>> Why? >>>>>> >>>>>> I have a filehandle open on /dev/mediaX. Now I unbind the device. That will >>>>>> unregister the media device, which will cause all file ops on the filehandle >>>>>> to return immediately, except for close(). >>>>>> >>>>>> And close() just frees the devnode from what I can see. >>>>>> >>>>>> There is a race if the device is unbound while in an ioctl, then all bets are >>>>>> off without proper life-time management. >>>>>> >>>>>> If it crashes after an unbind in the close() call, then something else is >>>>>> wrong, it shouldn't do that. >>>>>> >>>>>> What happens if you do 'sleep 20 </dev/mediaX', then unbind the device? >>>>>> >>>>>> I feel that I am missing something here. >>>>> >>>>> Actually these seems to be a bug in the 25th patch --- testing the devnode >>>>> registration needs to take place before checking for fops. >>>>> >>>>> After fixing that, the difference of issuing read(2) after unregistering >>>>> the device is between (probably) crashing and graciously failing. The >>>>> underlying problem is that without the 25th patch there pretty much is a >>>>> guarantee devnode has been released by the time it is accessed. >>>> >>>> Ah, it's a result of patch 06/26. Now devnode is embedded in struct media_device, >>>> which in turn is embedded in a device's state structure. And when that is freed, >>>> the devnode is also freed. >>>> >>>> This is wrong IMHO: either struct media_devnode or struct media_device has to >>>> be dynamically allocated. Embedding devices in a larger state structure causes >>>> exactly these types of problems. >>>> >>>> In this case I would just keep dynamically allocating the devnode. What is the reason >>>> for reverting that patch? The commit log doesn't say. >>> >>> I'll add that to the cover page of the next version. >>> >>> The struct media_device and media_devnode are different structs as it was >>> thought that V4L2 and other kernel subsystems would start using MC for what >>> did not materialise in the end, and therefore infrastructure was added to >>> enable that. We do not need that today, as we did not need it six years >>> ago. Thus there is no longer a reason to keep media_device and >>> media_devnode separate. >>> >>> By separately allocating these, as was done back in 2017, you can reduce >>> the window of possible access of released memory but you cannot eliminate >>> it. For that you need refcounting so that an open file handle will maintain >>> the in-memory data structures to carry out its IOCTL-related functions. >>> >>> So this set does not yet merge struct media_device and struct media_devnode >>> but makes it much easier to do as they are allocated together. It's just >>> about moving a little bit code around. >>> >>> The end goal (and partially the result already) is a cleaner codebase >>> without object lifetime issues. >>> >> >> So you revert a patch to make it cleaner, then have to add a really ugly >> patch back to fix what you broke? > > It's all broken to begin with. > > The alternative to this is either fix all drivers (a lot of work, largely > untestable) or considering MC terminally broken. I'd prefer the former as > we don't have an alternative for MC at the moment. > > If you really dislike the new hack (I don't think it's worse than the old > hack, it's much more localised at easier to understand it's broken), we > could keep it a few kernel releases, move the unconverted drivers to > staging and remove them with the hack, again after a few releases. > >> >> It's not just 'moving code around', you break functionality (imperfect as >> it is), and then have to fix it in another way. > > That is the intention. Repairing the state before this set without > reverting patches would make the intermediate patches very, very ugly and > difficult to review. Of course you could compare the result with the end > result of this patchset. > >> >> I would just drop this revert patch. And when all drivers are converted >> to 'do the right thing', then you can revert it. >> >> Unless there is another reason for reverting it? > > The reverts enable fixing the root problem, they are a pre-condition for > doing that. I'd prefer to enable writing drivers that are not broken (well, > at least this way). The 2017 fixes always were a dead end and we knew that > perfectly well. > I won't block this patch, but I think it is really ugly. And I am afraid that we might be stuck with this hack for a long time. How many drivers would need to be converted for this hack to go away? Note that this series needs to be rebased, I had a compile error in omap3isp, visl and in a mediatek driver. However, I also tested this series with a kernel that has CONFIG_DEBUG_KOBJECT_RELEASE on, and if I run the test-media script in v4l-utils: cd contrib/test sudo ./test-media mc then this crashes at the 'unbind vivid' stage: [ 292.900982] CPU: 2 PID: 108 Comm: kworker/2:3 Tainted: G B D W 6.3.0-rc2-debug #29 [ 292.900986] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 292.900990] Workqueue: events kobject_delayed_cleanup [ 292.900999] RIP: 0010:kobject_put+0x16/0x90 [ 292.901004] Code: 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 85 ff 74 7c 55 53 48 89 fb 48 8d bf e8 00 00 00 e8 7a c2 88 fe <f6> 83 e8 00 00 00 01 74 2b 48 8d 6b 38 be 04 00 00 00 48 89 ef e8 [ 292.901008] RSP: 0018:ffffc900016cfcf0 EFLAGS: 00010286 [ 292.901012] RAX: 0000000000000000 RBX: ffff8881b1e200d0 RCX: ffffffff82cd91c6 [ 292.901015] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881b1e201b8 [ 292.901018] RBP: ffff88814f576400 R08: 0000000000000000 R09: ffffffff84879fd7 [ 292.901020] R10: fffffbfff090f3fa R11: 0000000000000006 R12: ffff8881b1e22e98 [ 292.901023] R13: ffff8881445853e8 R14: 0000000000000000 R15: ffff8881f6338200 [ 292.901026] FS: 0000000000000000(0000) GS:ffff8881f6300000(0000) knlGS:0000000000000000 [ 292.901037] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 292.901040] CR2: ffff8881b1e201b8 CR3: 0000000149c63000 CR4: 0000000000750ee0 [ 292.901048] PKRU: 55555554 [ 292.901051] Call Trace: [ 292.901054] <TASK> [ 292.901056] device_release+0x5a/0x100 [ 292.901063] kobject_delayed_cleanup+0x5e/0xa0 [ 292.901066] process_one_work+0x535/0xa50 [ 292.901073] ? __pfx_process_one_work+0x10/0x10 [ 292.901077] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 292.901082] ? __list_add_valid+0x33/0xd0 [ 292.901087] worker_thread+0x8a/0x620 [ 292.901091] ? __kthread_parkme+0xd3/0xf0 [ 292.901095] ? __pfx_worker_thread+0x10/0x10 [ 292.901099] kthread+0x173/0x1b0 [ 292.901102] ? __pfx_kthread+0x10/0x10 [ 292.901105] ret_from_fork+0x2c/0x50 [ 292.901110] </TASK> [ 292.901112] Modules linked in: vivid v4l2_tpg videobuf2_dma_contig v4l2_dv_timings videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc [ 292.901126] CR2: ffff8881b1e201b8 [ 292.901130] ---[ end trace 0000000000000000 ]--- [ 292.901133] RIP: 0010:__mutex_lock+0xdb/0xcc0 [ 292.901138] Code: c0 e8 f9 4f 57 fe 2e 2e 2e 31 c0 48 c7 c7 60 d2 31 86 e8 98 07 84 fe 8b 05 d2 83 5f 03 85 c0 75 13 48 8d 7b 60 e8 b5 08 84 fe <48> 3b 5b 60 0f 85 e9 05 00 00 bf 01 00 00 00 e8 41 60 57 fe 48 8d [ 292.901141] RSP: 0018:ffffc90002727a40 EFLAGS: 00010282 [ 292.901145] RAX: 0000000000000000 RBX: 0493011a00000f42 RCX: ffffffff82d24e9b [ 292.901148] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0493011a00000fa2 [ 292.901151] RBP: ffffc90002727b90 R08: ffffffff81e74db3 R09: ffffffff84879fd7 [ 292.901154] R10: ffffc90002727ba8 R11: 0000000000000000 R12: 0000000000000000 [ 292.901157] R13: 0000000000000000 R14: ffff8881582b3078 R15: 1ffff920004e4f54 [ 292.901160] FS: 0000000000000000(0000) GS:ffff8881f6300000(0000) knlGS:0000000000000000 [ 292.901169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 292.901172] CR2: ffff8881b1e201b8 CR3: 0000000149c63000 CR4: 0000000000750ee0 [ 292.901185] PKRU: 55555554 The test-media regression test is really good at testing unbind scenarios for the virtual drivers that we have. Together with the CONFIG_DEBUG_KOBJECT_RELEASE option it is a good test to run. Regards, Hans