Hi,
On 8/22/23 12:14, Jarkko Nikula wrote:
Hi
On 8/22/23 12:02, Yann Sionneau wrote:
The DesignWare IP can be synthesized with the
IC_EMPTYFIFO_HOLD_MASTER_EN
parameter.
In this case, when the TX FIFO gets empty and the last command didn't
have
the STOP bit (IC_DATA_CMD[9]), the controller will hold SCL low until
a new command is pushed into the TX FIFO or the transfer is aborted.
When the controller is holding SCL low, it cannot be disabled.
The transfer must first be aborted.
Also, the bus recovery won't work because SCL is held low by the master.
Check if the master is holding SCL low in __i2c_dw_disable() before
trying
to disable the controller. If SCL is held low, an abort is initiated.
When the abort is done, then proceed with disabling the controller.
This whole situation can happen for instance during SMBus read data
block
if the slave just responds with "byte count == 0".
This puts the driver in an unrecoverable state, because the
controller is
holding SCL low and the current __i2c_dw_disable() procedure is not
working. In this situation only a SoC reset can fix the i2c bus.
Is this fixed already by the commit 69f035c480d7 ("i2c: designware:
Handle invalid SMBus block data response length value")?
https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git/commit/?h=i2c/for-next&id=69f035c480d76f12bf061148ccfd578e1099e5fc
Indeed the bug that I am having is fixed by
69f035c480d76f12bf061148ccfd578e1099e5fc
Meaning that thanks to 69f035c receiving a SMBus byte count of 0 won't
put the controller into a state where the completion will timeout and it
will need to start a recovery that will fail and then a controller
disabling that will also fail.
But, still, the disabling procedure is wrong, it lacks the abort part
(in case SCL is held low).
What my patch does, is fixing the disabling procedure. So that - for
example - even without 69f035c, the controller will timeout when
receiving byte count of 0, triggering the "disabling" procedure which
will work to recover the bus.
My patch fixes the general disabling code, that could be triggered by
the bug fixed by 69f035c but also by any other bug really.
Speaking of 69f035c btw ... I am really wondering if it's the best fix,
because it will lie to the kernel saying "we have byte count of 1, read
another byte" to trigger a read with STOP bit set so that the transfer
does finish and the controller does not timeout. But to do this it will
do an extra spurious byte read.
I propose another approach that will acknowledge that "we are in a bad
condition" and directly abort the transfer without doing an extra read:
https://marc.info/?l=linux-i2c&m=169175828013532&w=2
I hope my answer to your question is clear enough... English is not my
native language.
Regards,
--
Yann