On Wed, May 04, 2022 at 04:35:15PM +0200, Greg KH wrote: > On Wed, May 04, 2022 at 03:28:02PM +0800, Albert Wang wrote: > > There are still race conditions to hit the null pointer deference > > with my previous commit. So I re-write the code to dereference the > > pointer right after checking it is not null. > > What race conditions? > > And just moving it is not going to solve a race condition, you need a > lock. Hmm dwc->lock should already be held when entering this function. dwc3_thread_interrupt() spin_lock(&dwc->lock); -> dwc3_process_event_buf() -> dwc3_process_event_entry() -> dwc3_endpoint_interrupt() -> dwc3_gadget_endpoint_transfer_complete() -> dwc3_gadget_endpoint_trbs_complete() [this function] > > Fixes: 26288448120b ("usb: dwc3: gadget: Fix null pointer exception") > > > > Signed-off-by: Albert Wang <albertccwang@xxxxxxxxxx> > > --- > > drivers/usb/dwc3/gadget.c | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > index 19477f4bbf54..f2792968afd9 100644 > > --- a/drivers/usb/dwc3/gadget.c > > +++ b/drivers/usb/dwc3/gadget.c > > @@ -3366,15 +3366,14 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep, > > struct dwc3 *dwc = dep->dwc; > > bool no_started_trb = true; > > > > - if (!dep->endpoint.desc) > > - return no_started_trb; > > - > > dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); Ok I see, this function eventually leads to dwc3_giveback() getting called, which unlocks dwc->lock before calling each requests' callbacks and reacquires it afterwards. This gives an opportunity for usb_ep_disable() to come in and clear the descriptor. You should add an inline comment to make that clear that's what's happening here. > > if (dep->flags & DWC3_EP_END_TRANSFER_PENDING) > > goto out; > > > > - if (usb_endpoint_xfer_isoc(dep->endpoint.desc) && > > + if (!dep->endpoint.desc) > > + return no_started_trb; > > + else if (usb_endpoint_xfer_isoc(dep->endpoint.desc) && Drop the 'else', it isn't needed due to the return in the preceding check. > There is no locking here, so why would this change do anything but > reduce the window? After inspecting further, we do see locking is implicit, with the main gotcha being the unlock/re-lock that happens behind the scenes, which actually creates a window for the race to happen. This change moves the NULL check to be adjacent to where it's used, and more importantly after the window is "closed" (since we now have the lock again). Additional comments and more descriptive commit text should help make this more clear. Thanks, Jack