Re: [PATCH 2/2] i2c: omap: fix "Too much work in one IRQ" irq handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Sat, Nov 15, 2014 at 08:42:03AM +0300, Alexander Kochetkov wrote:
> Hello again.
> 
> > (please, never top-post)
> Sorry.
> 
> Sorry for the inaccurate presentation of ideas. I am not a native
> English speaker.

neither am I ;-)

> First about patches:
> [PATCH 1/2] and [PATCH 2/2] - intended to solve two independent problems.
> They were sent as series, In the future, try not to do so, In order
> not to mislead.

sending several fixes as a series is not a problem, a problem would be
to make fixes depend on new features or cleanups.

> > How could I ever call omap_i2c_complete_cmd() with 'err' set as 0 if I
> > had either a NACK or Arbitration Lost ?
> 
> [PATCH 1/2] - fix AL, NACK handling and does not apply to [PATCH 2/2].
> It not cause of problem solved of [PATCH 2/2].

still, how could we ever have that situation ? We break out of the loop
as soon as an error is encountered.

> > right, this is the same as it was before. If count reaches 100 we will
> > omap_i2c_complete_cmd().
> *No*
> 
> During refactoring submitted a series of patches was made the mistake.
> This error alters the behavior of the interrupt handler, if it ends
> with the message "Too much work in one IRQ".
>
> Error in the commit 66b9298878742f08cb6e79b7c7d5632d782fd1e1. 
> All subsequent commits were correct and translate this error further.
> 
> In the parent commit 3312d25e1abdc41be8a75a1b2c3ccaa39a14ed99
> (the commit before 66b9298878742f08cb6e79b7c7d5632d782fd1e1) in case
> count reaches 100, loop breaks *without* calling omap_i2c_complete_cmd().
> 
> In commit 66b9298878742f08cb6e79b7c7d5632d782fd1e1, in case
> count reaches 100, loop breaks and omap_i2c_complete_cmd() *get called*.

aaa, now I see what you're talking about. I'll review that code on
Monday. Let me see if that was intentional or I missed something.

> To see that, you need to open two versions of a file i2c-omap.c, side
> by side.
> 
> i2c-omap.c version corresponding to parent commit:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-omap.c?id=3312d25e1abdc41be8a75a1b2c3ccaa39a14ed99#n823
> 
> i2c-omap.c version corresponding next commit:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-omap.c?id=66b9298878742f08cb6e79b7c7d5632d782fd1e1#n823
> 
> According to wikipedia: "Code refactoring is the process of
> restructuring existing computer code – changing the factoring –
> without changing its external behavior.'
> 
> And commit's (66b9298878742f08cb6e79b7c7d5632d782fd1e1) message states what:
> i2c: omap: switch over to do {} while loop
> this will make sure that we execute at least once.
> No functional changes otherwise.
> 
> But functional change present!
> It's call of omap_i2c_complete_cmd()!
> 
> The next question: What affects this change?
> If count reaches 100 and loop breaks and no error occupy during loop processing,
> and no error would be set in err_cmd, omap_i2c_complete_cmd will set 0 as
> result of transfer and wakeup omap_i2c_xfer. In other words, current i2c transfer
> will be aborted with incorrect (success) status set. But, does transfer completed in real?
> Do buf data was sent to i2c slave device or received from it in real?
> Upper layer code would thing, that data was sent successfully.
> 
> How to see that bug in real live? Just add extra delayes into isr thread.
> I did it unintentionally inserting dev_warn into isr thread.

right, apparently it is a bug, but it's very difficult to reproduce
considering we break out of IRQ handler as soon as something gets
transferred. The possibility of that count reaching 100 is very minor. A
bug nevertheless.

> BTW. I made more testing and provide more logs to demonstrate affected
> changes.
> 
> 
> > which deadlock are you talking about ? How do you trigger it ? Where are
> > the lockup debug traces ?
> > 
> > Well, I tried to debug I2C ISR hang issue (thats another question I
> > want to discuss later) using output to console. I places few dev_warn
> > 
> > if you found a bug with the ISR, why discuss it later ? How can I
> > understand if you found a real bug without proper logs ?
> 
> 
> I put in order my thoughts and describe. It's next problem, not
> related to patch1/patch2.
> 
> In general, the problem (the 3rd one) is that linux either hang or
> segfault in the i2c-omap driver if another master on the i2c bus
> (multimaster environment),

multimaster is not supported by this driver, so you can't call that a
bug. It's a missing feature, it needs to be implemented. Nobody has ever
had access to a multimaster environment where to develop it, so it was
never done.

> submit write request to General Call Address. In that case ISR could
> not correctly handle RDR (or XRDY, ARDY, or that ever). Thats becase
> i2c-omap driver doesn't correctly handle i2c-controller's slave mode.

right, Linux doesn't support being the slave. That's also a missing
feature, not a bug.

> But controller enter slave mode after each master transfer. Yes, AAS
> and GC interrupts masked, but i2c-controller still sends RDR (or that
> ever events) when it detect slave access from another master on the
> bus.
> 
> I have two safe solutions of the (3rd) problem in the mind:
> - keep interrupts disabled between i2c-master access (I think about implementing
> i2c_omap_interrupt_enable_clr/i2c_omap_interrupt_enable_set helpers and putting
> them in the right places)
> - keep controller disabled between i2c-master access (keep EN-bit of CON register 0)

send a patch and we'll see, but keep in mind if you want to support
multimaster, you need to implement it as per documentation.

> That solutions also help with races between isr and xfer_msg.

what races ? If you found races, that's another problem which should be
fixed separately from implementing a new feature.

> What do you think about that?
> 
> How to reproduce 3-rd problem:
> In order to ease reproduce, you should disable i2c controller from
> entering suspend mode:
> echo on > /sys/devices/platform/omap/omap_i2c.2/power/control
> 
> And then, using another i2c-master, connected to the same (i2c.2) bus,
> initiate I2C write transfer to Generall Call Address, than linux ether
> hang or isr thread segfault :)

right, it has never been implemented, what would you expect ? ;-)

-- 
balbi

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux GPIO]     [Linux SPI]     [Linux Hardward Monitoring]     [LM Sensors]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux