Hi, On Sat, Nov 15, 2014 at 08:42:03AM +0300, Alexander Kochetkov wrote: > Hello again. > > > (please, never top-post) > Sorry. > > Sorry for the inaccurate presentation of ideas. I am not a native > English speaker. neither am I ;-) > First about patches: > [PATCH 1/2] and [PATCH 2/2] - intended to solve two independent problems. > They were sent as series, In the future, try not to do so, In order > not to mislead. sending several fixes as a series is not a problem, a problem would be to make fixes depend on new features or cleanups. > > How could I ever call omap_i2c_complete_cmd() with 'err' set as 0 if I > > had either a NACK or Arbitration Lost ? > > [PATCH 1/2] - fix AL, NACK handling and does not apply to [PATCH 2/2]. > It not cause of problem solved of [PATCH 2/2]. still, how could we ever have that situation ? We break out of the loop as soon as an error is encountered. > > right, this is the same as it was before. If count reaches 100 we will > > omap_i2c_complete_cmd(). > *No* > > During refactoring submitted a series of patches was made the mistake. > This error alters the behavior of the interrupt handler, if it ends > with the message "Too much work in one IRQ". > > Error in the commit 66b9298878742f08cb6e79b7c7d5632d782fd1e1. > All subsequent commits were correct and translate this error further. > > In the parent commit 3312d25e1abdc41be8a75a1b2c3ccaa39a14ed99 > (the commit before 66b9298878742f08cb6e79b7c7d5632d782fd1e1) in case > count reaches 100, loop breaks *without* calling omap_i2c_complete_cmd(). > > In commit 66b9298878742f08cb6e79b7c7d5632d782fd1e1, in case > count reaches 100, loop breaks and omap_i2c_complete_cmd() *get called*. aaa, now I see what you're talking about. I'll review that code on Monday. Let me see if that was intentional or I missed something. > To see that, you need to open two versions of a file i2c-omap.c, side > by side. > > i2c-omap.c version corresponding to parent commit: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-omap.c?id=3312d25e1abdc41be8a75a1b2c3ccaa39a14ed99#n823 > > i2c-omap.c version corresponding next commit: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-omap.c?id=66b9298878742f08cb6e79b7c7d5632d782fd1e1#n823 > > According to wikipedia: "Code refactoring is the process of > restructuring existing computer code – changing the factoring – > without changing its external behavior.' > > And commit's (66b9298878742f08cb6e79b7c7d5632d782fd1e1) message states what: > i2c: omap: switch over to do {} while loop > this will make sure that we execute at least once. > No functional changes otherwise. > > But functional change present! > It's call of omap_i2c_complete_cmd()! > > The next question: What affects this change? > If count reaches 100 and loop breaks and no error occupy during loop processing, > and no error would be set in err_cmd, omap_i2c_complete_cmd will set 0 as > result of transfer and wakeup omap_i2c_xfer. In other words, current i2c transfer > will be aborted with incorrect (success) status set. But, does transfer completed in real? > Do buf data was sent to i2c slave device or received from it in real? > Upper layer code would thing, that data was sent successfully. > > How to see that bug in real live? Just add extra delayes into isr thread. > I did it unintentionally inserting dev_warn into isr thread. right, apparently it is a bug, but it's very difficult to reproduce considering we break out of IRQ handler as soon as something gets transferred. The possibility of that count reaching 100 is very minor. A bug nevertheless. > BTW. I made more testing and provide more logs to demonstrate affected > changes. > > > > which deadlock are you talking about ? How do you trigger it ? Where are > > the lockup debug traces ? > > > > Well, I tried to debug I2C ISR hang issue (thats another question I > > want to discuss later) using output to console. I places few dev_warn > > > > if you found a bug with the ISR, why discuss it later ? How can I > > understand if you found a real bug without proper logs ? > > > I put in order my thoughts and describe. It's next problem, not > related to patch1/patch2. > > In general, the problem (the 3rd one) is that linux either hang or > segfault in the i2c-omap driver if another master on the i2c bus > (multimaster environment), multimaster is not supported by this driver, so you can't call that a bug. It's a missing feature, it needs to be implemented. Nobody has ever had access to a multimaster environment where to develop it, so it was never done. > submit write request to General Call Address. In that case ISR could > not correctly handle RDR (or XRDY, ARDY, or that ever). Thats becase > i2c-omap driver doesn't correctly handle i2c-controller's slave mode. right, Linux doesn't support being the slave. That's also a missing feature, not a bug. > But controller enter slave mode after each master transfer. Yes, AAS > and GC interrupts masked, but i2c-controller still sends RDR (or that > ever events) when it detect slave access from another master on the > bus. > > I have two safe solutions of the (3rd) problem in the mind: > - keep interrupts disabled between i2c-master access (I think about implementing > i2c_omap_interrupt_enable_clr/i2c_omap_interrupt_enable_set helpers and putting > them in the right places) > - keep controller disabled between i2c-master access (keep EN-bit of CON register 0) send a patch and we'll see, but keep in mind if you want to support multimaster, you need to implement it as per documentation. > That solutions also help with races between isr and xfer_msg. what races ? If you found races, that's another problem which should be fixed separately from implementing a new feature. > What do you think about that? > > How to reproduce 3-rd problem: > In order to ease reproduce, you should disable i2c controller from > entering suspend mode: > echo on > /sys/devices/platform/omap/omap_i2c.2/power/control > > And then, using another i2c-master, connected to the same (i2c.2) bus, > initiate I2C write transfer to Generall Call Address, than linux ether > hang or isr thread segfault :) right, it has never been implemented, what would you expect ? ;-) -- balbi
Attachment:
signature.asc
Description: Digital signature