Re: [PATCH 25/26] media: Implement best effort media device removal safety sans refcounting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Hans,

On Tue, Mar 14, 2023 at 09:58:43AM +0100, Hans Verkuil wrote:
> On 14/03/2023 09:43, Sakari Ailus wrote:
> > Hi Hans,
> > 
> > On Tue, Mar 14, 2023 at 09:30:52AM +0100, Hans Verkuil wrote:
> >> On 13/03/2023 17:53, Sakari Ailus wrote:
> >>> Hi Hans,
> >>>
> >>> On Mon, Mar 13, 2023 at 03:39:25PM +0100, Hans Verkuil wrote:
> >>>> On 13/03/2023 15:02, Sakari Ailus wrote:
> >>>>> Hi Hans,
> >>>>>
> >>>>> On Mon, Mar 13, 2023 at 02:46:27PM +0100, Hans Verkuil wrote:
> >>>>>> On 03/03/2023 12:08, Sakari Ailus wrote:
> >>>>>>> Hi Hans,
> >>>>>>>
> >>>>>>> On Fri, Mar 03, 2023 at 09:54:40AM +0100, Hans Verkuil wrote:
> >>>>>>>> On 03/03/2023 09:39, Hans Verkuil wrote:
> >>>>>>>>> On 01/02/2023 22:45, Sakari Ailus wrote:
> >>>>>>>>>> Add a new helper data structure media_devnode_compat_ref, which is used to
> >>>>>>>>>> prevent user space from calling IOCTLs or other system calls to the media
> >>>>>>>>>> device that has been already unregistered.
> >>>>>>>>>>
> >>>>>>>>>> The media device's memory may of course still be released during the call
> >>>>>>>>>> but there is only so much that can be done to this without the driver
> >>>>>>>>>> managing the lifetime of the resources it needs somehow.
> >>>>>>>>>>
> >>>>>>>>>> This patch should be reverted once all drivers have been converted to manage
> >>>>>>>>>> their resources' lifetime.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Sakari Ailus <sakari.ailus@xxxxxxxxxxxxxxx>
> >>>>>>>>>> ---
> >>>>>>>>>>  drivers/media/mc/mc-device.c  | 60 ++++++++++++++++++++++++++++++-----
> >>>>>>>>>>  drivers/media/mc/mc-devnode.c | 21 ++++++++----
> >>>>>>>>>>  include/media/media-devnode.h | 29 +++++++++++++++++
> >>>>>>>>>>  3 files changed, 96 insertions(+), 14 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/media/mc/mc-device.c b/drivers/media/mc/mc-device.c
> >>>>>>>>>> index 3a1db5fdbba7..22fdaa6370ea 100644
> >>>>>>>>>> --- a/drivers/media/mc/mc-device.c
> >>>>>>>>>> +++ b/drivers/media/mc/mc-device.c
> >>>>>>>>>> @@ -45,18 +45,34 @@ static inline void __user *media_get_uptr(__u64 arg)
> >>>>>>>>>>  	return (void __user *)(uintptr_t)arg;
> >>>>>>>>>>  }
> >>>>>>>>>>  
> >>>>>>>>>> +static void compat_ref_release(struct kref *kref)
> >>>>>>>>>> +{
> >>>>>>>>>> +	struct media_devnode_compat_ref *ref =
> >>>>>>>>>> +		container_of_const(kref, struct media_devnode_compat_ref, kref);
> >>>>>>>>>> +
> >>>>>>>>>> +	kfree(ref);
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>  static int media_device_open(struct media_devnode *devnode, struct file *filp)
> >>>>>>>>>>  {
> >>>>>>>>>>  	struct media_device *mdev = to_media_device(devnode);
> >>>>>>>>>>  	struct media_device_fh *fh;
> >>>>>>>>>>  	unsigned long flags;
> >>>>>>>>>>  
> >>>>>>>>>> +	if (devnode->ref && (!atomic_read(&devnode->ref->registered) ||
> >>>>>>>>>> +			     !kref_get_unless_zero(&devnode->ref->kref)))
> >>>>>>>>>> +		return -ENXIO;
> >>>>>>>>>> +
> >>>>>>>>>
> >>>>>>>>> This seems pointless: if the media device is unregistered, then the device
> >>>>>>>>> node disappears and it can't be opened anymore.
> >>>>>>>>>
> >>>>>>>>> I'm confused by this patch in general: when media_device_unregister() is called,
> >>>>>>>>> it is no longer possible to call ioctls and basically do anything except close
> >>>>>>>>> the open fh.
> >>>>>>>>>
> >>>>>>>>> So what am I missing here? It all looks odd.
> >>>>>>>>
> >>>>>>>> I read up on this a bit more, and I think this patch is bogus: drivers not
> >>>>>>>> converted to the release() callback will indeed just crash, but that's no
> >>>>>>>> different than many existing drivers, media or otherwise, when you forcibly
> >>>>>>>> unbind them. It's broken today, and since you have to be root to unbind, I
> >>>>>>>> would say that we can just leave it as-is rather than introducing a rather
> >>>>>>>> ugly workaround. I don't think it will help anyway, since most likely
> >>>>>>>> such drivers will also fails if the application has a video device open
> >>>>>>>> when the device is unbound.
> >>>>>>>
> >>>>>>> The main difference is whether accessing such a file handle will access
> >>>>>>> released memory always or whether that is possible only during a very brief
> >>>>>>> amount of time.
> >>>>>>>
> >>>>>>
> >>>>>> I still don't like this. It was broken before, and it is broken now (perhaps a
> >>>>>> bit less broken, but still...).
> >>>>>>
> >>>>>> There is a right fix now, and drivers that are likely to be removed forcibly
> >>>>>> should be converted. This patch just makes it more likely that such drivers
> >>>>>> won't be converted since it is less likely to hit this, so people will just
> >>>>>> think that this is 'good enough'. And it makes the code a lot uglier.
> >>>>>
> >>>>> I agree, although converting the drivers is easier said than done. Note
> >>>>> that also DVB is affected by this, not just V4L2. There are quite a few DVB
> >>>>> USB devices.
> >>>>>
> >>>>> The behaviour before this set (since ~ 2017) is restored by the last few
> >>>>> patches, without these we're on pre-2017 behaviour which basically means
> >>>>> that all media IOCTLs on file handles the device of which is gone, will
> >>>>> always access released memory. That was quite bad, too.
> >>>>
> >>>> Why?
> >>>>
> >>>> I have a filehandle open on /dev/mediaX. Now I unbind the device. That will
> >>>> unregister the media device, which will cause all file ops on the filehandle
> >>>> to return immediately, except for close().
> >>>>
> >>>> And close() just frees the devnode from what I can see.
> >>>>
> >>>> There is a race if the device is unbound while in an ioctl, then all bets are
> >>>> off without proper life-time management.
> >>>>
> >>>> If it crashes after an unbind in the close() call, then something else is
> >>>> wrong, it shouldn't do that.
> >>>>
> >>>> What happens if you do 'sleep 20 </dev/mediaX', then unbind the device?
> >>>>
> >>>> I feel that I am missing something here.
> >>>
> >>> Actually these seems to be a bug in the 25th patch --- testing the devnode
> >>> registration needs to take place before checking for fops.
> >>>
> >>> After fixing that, the difference of issuing read(2) after unregistering
> >>> the device is between (probably) crashing and graciously failing. The
> >>> underlying problem is that without the 25th patch there pretty much is a
> >>> guarantee devnode has been released by the time it is accessed.
> >>
> >> Ah, it's a result of patch 06/26. Now devnode is embedded in struct media_device,
> >> which in turn is embedded in a device's state structure. And when that is freed,
> >> the devnode is also freed.
> >>
> >> This is wrong IMHO: either struct media_devnode or struct media_device has to
> >> be dynamically allocated. Embedding devices in a larger state structure causes
> >> exactly these types of problems.
> >>
> >> In this case I would just keep dynamically allocating the devnode. What is the reason
> >> for reverting that patch? The commit log doesn't say.
> > 
> > I'll add that to the cover page of the next version.
> > 
> > The struct media_device and media_devnode are different structs as it was
> > thought that V4L2 and other kernel subsystems would start using MC for what
> > did not materialise in the end, and therefore infrastructure was added to
> > enable that. We do not need that today, as we did not need it six years
> > ago. Thus there is no longer a reason to keep media_device and
> > media_devnode separate.
> > 
> > By separately allocating these, as was done back in 2017, you can reduce
> > the window of possible access of released memory but you cannot eliminate
> > it. For that you need refcounting so that an open file handle will maintain
> > the in-memory data structures to carry out its IOCTL-related functions.
> > 
> > So this set does not yet merge struct media_device and struct media_devnode
> > but makes it much easier to do as they are allocated together. It's just
> > about moving a little bit code around.
> > 
> > The end goal (and partially the result already) is a cleaner codebase
> > without object lifetime issues.
> > 
> 
> So you revert a patch to make it cleaner, then have to add a really ugly
> patch back to fix what you broke?

It's all broken to begin with.

The alternative to this is either fix all drivers (a lot of work, largely
untestable) or considering MC terminally broken. I'd prefer the former as
we don't have an alternative for MC at the moment.

If you really dislike the new hack (I don't think it's worse than the old
hack, it's much more localised at easier to understand it's broken), we
could keep it a few kernel releases, move the unconverted drivers to
staging and remove them with the hack, again after a few releases.

> 
> It's not just 'moving code around', you break functionality (imperfect as
> it is), and then have to fix it in another way.

That is the intention. Repairing the state before this set without
reverting patches would make the intermediate patches very, very ugly and
difficult to review. Of course you could compare the result with the end
result of this patchset.

> 
> I would just drop this revert patch. And when all drivers are converted
> to 'do the right thing', then you can revert it.
> 
> Unless there is another reason for reverting it?

The reverts enable fixing the root problem, they are a pre-condition for
doing that. I'd prefer to enable writing drivers that are not broken (well,
at least this way). The 2017 fixes always were a dead end and we knew that
perfectly well.

-- 
Kind regards,

Sakari Ailus



[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux