Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Thu, 20 Oct 2022 10:25:01 -0400

On Thu, Oct 20, 2022 at 09:57:24AM +0900, Ricardo Ribalda wrote:
> Hi Alan
> 
> On Thu, 20 Oct 2022 at 00:08, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, Oct 19, 2022 at 01:22:48PM +0900, Ricardo Ribalda wrote:
> > > Hi Laurent
> > >
> > > On Wed, 19 Oct 2022 at 10:45, Laurent Pinchart
> > > <laurent.pinchart@xxxxxxxxxxxxxxxx> wrote:
> > > > And I would like to avoid having to roll out manual changes to all
> > > > drivers when the problem can be fixed in the core, just because nobody
> > > > can be bothered to spend time to implement a good fix. We don't have to
> > > > aim for a solution at the cdev level if that takes too long, an
> > > > implementation in V4L2 would be enough to start with.
> > >
> > > Do we know what a "good fix" would look like?. This is a race
> > > condition between cdev, v4l2, and usb_driver. The only entity that
> > > knows about the three of them is the driver.
> > >
> > > If we "fix" v4l2 to provide a callback to notify the framework about a
> > > "bus disconnect". It can prevent new syscalls, but it cannot interrupt
> > > the current ones.
> >
> > It doesn't need to interrupt current syscalls.  It merely needs to wait
> > until the current ones complete (and help them to complete early by
> > making them aware of the disconnection) and to prevent new ones from
> > starting.
> >
> 
> The USB subsystem is not aware of the current syscalls running for that device,
> it just triggers the callback disconnect() to notify the driver that
> they are not allowed
> anymore to access the hardware.

Right.  The question is: At what level in the various video code paths 
should the check for "device gone" then be made?

One possibility is to do all these checks at the USB driver level.  This 
has the advantage of not requiring any changes to the V4L2 core or 
elsewhere in the framework.  But it has the disadvantage of spreading 
the checks all over the place, making it that much easier to forget one.  
Furthermore it would help only the USB driver, not any of the other 
video drivers.

Another possibility is to do the checks at the framework level, or at 
least, higher up than the driver level.  For example, upon syscall 
entry, and for long-running commands, at suitable intermediate points.  
This can be implemented by having the driver call a core function from 
within its disconnect callback; that function would let the framework 
know the device is gone and wouldn't return until all existing syscall 
threads were aware of this fact and had stopped accessing the device.
This approach has advantages and disadvantages complementary to the 
first approach.

I can't tell you which approach is better -- that's up to Laurent :-) -- 
I just want to make sure you are aware of the possibilities and their 
tradeoffs.

> Even when/if we fix this for real, a "basic test" checking if the
> device is disconnected is a
> nice thing to have. I think of it as a protective programming :)
> 
> Something like:
> 
> if WARN_ON(is_connected)
>    return -EIO;

Sure, it wouldn't hurt to sprinkle things like that here and there.  But 
obviously they won't fix the problem by themselves.

Alan Stern

> > I have no idea what facility (if any) the framework uses for this
> > already.  However, if it turns out that proper synchronization needs a
> > new approach, I suggest trying SRCU.  It can be viewed in some respects
> > as a kind of read-write mutex that is highly optimized for rapid
> > read-locks and -unlocks at the cost of very slow write-locks --
> > appropriate here since every syscall would need a read-lock whereas
> > write-locking would be needed only when a disconnect occurs.
> 
> 
> Thanks for the pointer :)
> 
> >
> > Alan Stern
> 
> 
> 
> -- 
> Ricardo Ribalda