On 23-08-24 01:27 pm, Michael Nazzareno Trimarchi wrote: > Hi Prashanth > > On Fri, Aug 23, 2024 at 9:34 AM Prashanth K <quic_prashk@xxxxxxxxxxx> wrote: >> >> >> >> On 23-08-24 12:28 pm, Michael Nazzareno Trimarchi wrote: >>> Hi >>> >>> On Fri, Aug 23, 2024 at 8:40 AM 胡连勤 <hulianqin@xxxxxxxx> wrote: >>>> >>>> Hello linux community expert: >>>> >>>>>> Fixes: c1dca562be8a ("usb gadget: split out serial core") >>>>>> Cc: stable@xxxxxxxxxxxxxxx >>>>>> Signed-off-by: Lianqin Hu <hulianqin@xxxxxxxx> >>>>>> --- >>>>>> v6: >>>>>> - Update the commit text >>>>>> - Add the Fixes tag >>>>>> - CC stable kernel >>>>>> - Add serial_port_lock protection when checking port pointer >>>>>> - Optimize code comments >>>>>> - Delete log printing >>>> >>>>> You need to list ALL of the versions here, I seem to have missed v4 and >>>>> v5 somewhere so I don't know what changed there. >>>> >> >> [...] >>>>> nested spinlocks, why? Did you run this with lockdep enabled to verify you aren't hitting a different bug now? >>>> Because there is a competition relationship between this function and the gserial_disconnect function, >>>> the gserial_disconnect function first obtains serial_port_lock and then obtains port->port_lock. >>>> The purpose of nesting is to ensure that when gs_read_complete is executed, it can be successfully executed after obtaining serial_port_lock. >>>> gserial_disconnect(..) >>>> { >>>> struct gs_port *port = gser->ioport; >>>> ... >>>> spin_lock_irqsave(&serial_port_lock, flags); >>>> spin_lock(&port->port_lock); >>>> ... >>>> gser->ioport = NULL; ---> port = NULL; >>>> ... >>>> spin_unlock(&port->port_lock); >>>> spin_unlock_irqrestore(&serial_port_lock, flags); >>>> } >>>> >>>> After enabling the lockdep function (CONFIG_DEBUG_LOCK_ALLOC=y), there is no lockdep-related warning information. >>>> >>>>> And why is one irqsave and one not? That feels odd, it might be right, but you need to document here why the difference. >>>> After the gs_read_complete function is executed, spin_unlock_irqrestore is used to restore the previous state, >>> >>> 胡连勤 this is not a common locking pattern that is the reason that >>> should be properly described. >> This pattern was already used on gser_suspend/resume callbacks, this was >> done because the lock was stored under port (and port itself was >> becoming null), hence we added a static spinlock to mitigate it. >>> > I see now, but why don't disable the endpoint before disconnecting? > > /* disable endpoints, aborting down any active I/O */ > usb_ep_disable(gser->out); > usb_ep_disable(gser->in); > > Michael > Not sure about this case, I think generally we need stop IO before disabling EP, otherwise TX/RX functions may queue requests while EP is getting disabled, thats why i think port is removed before ep_disable. Moreover these callbacks (complete/suspend/resume etc) comes from UDC and can be async, so better to use locks to prevent these kind of races. Regards, Prashanth K