Re: [RFC PATCH 0/3] Re: i2c: core: introduce atomic transfers

Stefan Lengfeld <contact@xxxxxxxxxxxxxxx> · Sat, 13 Jul 2019 16:15:09 +0200

Hi Wolfram,

I finally found some time to write my response.

On Mon, Apr 22, 2019 at 09:24:15AM +0200, Wolfram Sang wrote:
> 
> > > There is freq scaling going on when 'system_state > SYSTEM_RUNNING'? Is
> > > this a guess or confirmed?
> > 
> > Only some memories in my head that the frequency scaler caused I2C
> > adapter locking issues on a multi core device. So read it as a 'guess'.
> > But since I don't like unsolved mysteries in software engineering, I'm
> > now trying to get an i.MX6 Quad device to verify this. I have a very
> > kind donor :-)
> 
> Cool. It would indeed be very important to have that confirmed / sorted
> out. Thanks!
> 

In May Phytec has donated a second phyBOARD-Mira Board to me with a
i.MX6 Quad Soc [1]. Thanks :-) So I could test the I2C atomic transfer
patches on a SMP SoC, not only on a i.MX6 solo with a single ARM core.

> > > There is freq scaling going on when 'system_state > SYSTEM_RUNNING'? Is
> > > this a guess or confirmed?

Summary: In new kernels (since v4.0) cpufreq is not running when the
system is going down. It's disabled on reboot/shutdown.

Long version: In v3.19 or earlier kernels the frequencey scaling is
still active when the system is going down. For the kernel v4.0-rc1 the
patch 

    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=90de2a4aa9f355b55e57eaf2fd584897dd0adf8c

    cpufreq: suspend cpufreq governors on shutdown

    We should stop cpufreq governors when we shut down the system.  If we
    don't do this, we can end up with this deadlock:

    1. cpufreq governor may be running on a CPU other than CPU0.
    2. In machine_restart() we call smp_send_stop() which stops CPUs.
       If one of these CPUs was actively running a cpufreq governor
       then it may have the mutex / spinlock needed to access the main
       PMIC in the system (perhaps over I2C)
    3. If a machine needs access to the main PMIC in order to shutdown
       then it will never get it since the mutex was lost when the other
       CPU stopped.
    4. We'll hang (possibly eventually hitting the hard lockup detector).

    Let's avoid the problem by stopping the cpufreq governor at shutdown,
    which is a sensible thing to do anyway.

changed that behavior. The commit message clearly describes the deadlock
behavior that I was seeing and tried to fix with my initial patchset
[2].

At that time I tried to fix the issue at a generic level in the I2C
subsystem and block the whole I2C adapter for only a single driver,
because there could be other kernel threads trying to use I2C bus.

Now I think differently. Any in-kernel user/thread that still runs at
reboot/shutdown and uses the I2C bus is a bug. They all should be fixed
like the cpufreq and orderly shutdown themselves. This should guarantee
that there is _no_ on-going I2C bus activity and the I2C adapter mutex
is not acquired.

In my current tests I didn't found another driver that is doing late I2C
communication on my boards.

So I think the new code 

    /*
     * We only allow atomic transfers for very late communication, e.g. to send
     * the powerdown command to a PMIC. Atomic transfers are a corner case and not
     * for generic use!
     */
    static inline bool i2c_in_atomic_xfer_mode(void)
    {
            return system_state > SYSTEM_RUNNING && irqs_disabled();
    }

is very sensible and the correct check.

For atomic I2C transfers you have to guarantee a free bus. For reboot
handlers we can guarantee that by shutting down any in-kernel threads.
All user processes are terminated already.

For normal interrupt contexts we don't have this guarantee. Interrupts
can fire at any time and cpufreq or other I2C usages, e.g. user
processes, can be in progress and hold the lock.

So I think the usecase I2C reboot/reset handlers is solved now. :-)

Other usescases, like oops handling, pointed out by Russel King.
See https://www.spinics.net/lists/arm-kernel/msg682695.html

> Also, how does this get around the issue which I pointed out with (eg)
> an oops occuring, which leads to a panic followed by an attempt to
> reboot if the I2C bus in question is already mid-transaction?  Won't
> we deadlock?

need some additional work and thoughts. E.g. your
ignore-the-locking-and-reset-the-IP-core idea [3].

For embedded devices the watchdog, which is integrated into the PMIC,
should reboot the board in case of an oops. So for these devices I don't
think this feature is required.

Some final words: It's very cool that the upstream kernel now supports
I2C reboot/reset handlers. Thanks for your patches and the steady
process of getting this done.

Kind regards,
Stefan

[1]: https://www.phytec.eu/product-eu/single-board-computer/phyboard-mira/
[2]: https://www.spinics.net/lists/linux-i2c/msg25401.html
[3]: https://www.spinics.net/lists/arm-kernel/msg705961.html