2021년 8월 19일 (목) 오전 1:17, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>님이 작성: > > On 2021-08-17 18:53:13 [+0900], Jeaho Hwang wrote: > > hw_ep_prime sometimes fails if irq occurs while it rus on RT kernel. > > How/ why does it fail? Which IRQ occurs? Does it also occur without RT > and with threadirqs enabled? > We experienced priming failure while trying cable connection to a Windows RNDIS host. And we found that a twd interrupt occurs during execution of hw_ep_prime for failure cases. According to the manual, checking ENDPTSETUPSTAT before/after priming should be done immediately since the host could send a setup packet which makes the priming invalidated. So we think that the twd (or any) interrupt could make a timing issue between udc_irq and the RNDIS host. Without RT, udc_irq runs as a forced threaded irq handler, so it runs without any interruption or preemption. NO similar case is found on non-RT. > > local_irq_save/restore is added inside the function to gurantee atomicity. > > only effective for preempt_rt since hw_ep_prime is called inside top half > > or spin_lock_irqsave. No effect is expected for standard linux. > > How is that helping? > #1 > udc_irq() -> isr_tr_complete_handler() -> isr_tr_complete_low -> > _hardware_dequeue() -> reprime_dtd() -> hw_ep_prime() > > udc_irq() acquires ci->lock. > > #2 > ep_queue -> _ep_queue() ->_hardware_enqueue() -> hw_ep_prime() > > ep_queue acquires hwep->lock. Which is actually ci->lock. > > So if I read this right then hw_ep_prime() may not be interrupted in the > middle of its operation (but preempted) because each path is protected > by the lock. > > isr_tr_complete_low() drops hwep->lock and acquires it again so it that > phase another thread may acquire it. > > > Signed-off-by: Jeaho Hwang <jhhwang@xxxxxxxxxx> > > > > diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c > > index 8834ca613721..a624eddb3e22 100644 > > --- a/drivers/usb/chipidea/udc.c > > +++ b/drivers/usb/chipidea/udc.c > > @@ -191,22 +191,31 @@ static int hw_ep_get_halt(struct ci_hdrc *ci, int num, int dir) > > static int hw_ep_prime(struct ci_hdrc *ci, int num, int dir, int is_ctrl) > > { > > int n = hw_ep_bit(num, dir); > > + unsigned long flags; > > + int ret = 0; > > > > /* Synchronize before ep prime */ > > wmb(); > > > > - if (is_ctrl && dir == RX && hw_read(ci, OP_ENDPTSETUPSTAT, BIT(num))) > > + /* irq affects this routine so irq should be disabled on RT. > > + * on standard kernel, irq is already disabled by callers. > > The important part is _how_ it is affected. If locking works then > nothing should read/ write the HW register. If the lock is briefly > dropped then another thread _may_ read/ write the registers but not > within this function. > > If this function here is sensitive to timing (say the cpu_relax() loop > gets interrupt for 1ms) then it has to be documented as such. The controller sets ENDPTSETUPSTAT register if the host sent a setup packet. yes it is a timing problem. I will document that and resubmit again if you agree that local_irq_save could help from the timing problem. Thanks for the advice. > > > + */ > > + local_irq_save(flags); > > + if (is_ctrl && dir == RX && hw_read(ci, OP_ENDPTSETUPSTAT, BIT(num))) { > > + local_irq_restore(flags); > > return -EAGAIN; > > + } > > > > hw_write(ci, OP_ENDPTPRIME, ~0, BIT(n)); > > > > while (hw_read(ci, OP_ENDPTPRIME, BIT(n))) > > cpu_relax(); > > if (is_ctrl && dir == RX && hw_read(ci, OP_ENDPTSETUPSTAT, BIT(num))) > > - return -EAGAIN; > > + ret = -EAGAIN; > > > > + local_irq_restore(flags); > > /* status shoult be tested according with manual but it doesn't work */ > > - return 0; > > + return ret; > > } > > > > /** > > Sebastian -- 황재호, Jay Hwang, linux team manager of RTst 010-7242-1593