Hi, On Wed, Jan 14, 2015 at 11:04:27PM +0000, Paul Zimmerman wrote: > > > > > > > > > > This is really, really odd. Register accesses are atomic, so the lock > > > > > > > > > > isn't really doing anything. Besides, you're calling > > > > > > > > > > dwc2_is_controller_alive() from within the IRQ handler, so IRQs are > > > > > > > > > > already disabled. > > > > > > > > > > > > > > > > > > Spinlocks sometimes do more than you think. For instance, here the > > > > > > > > > lock prevents the register access from happening while some other CPU > > > > > > > > > is holding the lock. If a silicon quirk causes the register access to > > > > > > > > > interfere with other activities, this could be important. > > > > > > > > > > > > > > > > readl() (which is used by dwc2_is_controller_alive()) adds a memory > > > > > > > > barrier to the register accesses, that should force all register > > > > > > > > accesses the be correctly ordered. > > > > > > > > > > > > > > Memory barriers will order accesses that are all made on the same CPU > > > > > > > with respect to each other. They do not order these accesses against > > > > > > > accesses made from another CPU -- that's why we have spinlocks. :-) > > > > > > > > > > > > a fair point :-) The register is still read-only, so that shouldn't > > > > > > matter either :-) > > > > > > > > > > > > > > I fail to see how a silicon quirk > > > > > > > > could cause this and if, indeed, it does, I'd be more comfortable with a > > > > > > > > proper STARS tickect number from synopsys :-s > > > > > > > > > > > > > > Maybe accessing this register somehow resets something else. I don't > > > > > > > know. It seems unlikely, but at least it explains how adding a > > > > > > > spinlock could fix the problem. > > > > > > > > > > > > I would really need Paul (or someone at Synopsys) to confirm this > > > > > > somehow. Maybe it has something to do with how the register is > > > > > > implemented, dunno. > > > > > > > > > > > > Paul, do you have any idea what could cause this ? Could the HW into > > > > > > some weird state if we read GSNPSID at random locations or when data is > > > > > > being transferred, or anything like that ? > > > > > > > > > > Only thing I can think of is that there is some silicon bug in Robert's > > > > > platform. But I am not aware of any STARs that mention accesses to the > > > > > GSNPSID register as being problematic. > > > > > > > > > > Funny thing is, this code has been basically the same since at least > > > > > November 2013. So I think some other recent change must have modified > > > > > the timing of the register accesses, or something like that. But that's > > > > > just handwaving, really. > > > > > > > > Alright, I'll apply this patch but for 3.20 with a stable tag as I have > > > > already sent my last pull request to Greg. Unless someone has a really > > > > big complaint about doing things as such. > > > > > > It should go to 3.19-rc shouldn't it? It's a fix, and Robert's platform > > > is broken without it, IIUC. > > > > It can also be categorized as "has-never-worked-before" before the code > > has been like this forever. Since we don't really have a git bisect > > result pointing to a commit that went in v3.19 merge window, I'm not > > sure how I can convince myself that this absolutely needs to be in > > v3.19. > > > > At a minimum, I need a proper bisection with a proper commit being > > blamed (even if it's a commit from months ago). From my point of view, > > debugging of this "regression" has not been finalized and we're just > > "assuming" it's caused by GSNPSID because moving that inside the > > spin_lock seems to fix the problem. > > On further investigation, I was wrong about "this code has been > basically the same since at least November 2013". Prior to commit > db8178c33db "usb: dwc2: Update common interrupt handler to call gadget > interrupt handler" from November 2014, the gadget interrupt handler > did not read from the GSNPSID register. right, but the common IRQ always did. So unless Robert's SoC has always been used only for peripheral, then I agree with you that behavior did, in fact, change. > So likely the bug in Robert's hardware has been there all along, and > that commit just caused it to manifest itself. Robert, out of curiosity, which SoC are you using ? Is it UP or SMP ? I guess we need a mention on commit log that at least SoC XYZ is known to break unless the register access is done with locks held. -- balbi
Attachment:
signature.asc
Description: Digital signature