Re: [PATCH 0/1] pinctrl: baytrail: Really serialize all register accesses

Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> · Mon, 18 Nov 2019 19:23:20 +0200

On Mon, Nov 18, 2019 at 03:20:19PM +0100, Hans de Goede wrote:
> Hi All,
> 
> First let me copy-paste a bit of the commit msg for background:
> 
> ---begin commit msg---
> Commit 39ce8150a079 ("pinctrl: baytrail: Serialize all register access")
> added a spinlock around all register accesses because:
> 
> "There is a hardware issue in Intel Baytrail where concurrent GPIO register
>  access might result reads of 0xffffffff and writes might get dropped
>  completely."
> 
> Testing has shown that this does not catch all cases, there are still
> 2 problems remaining
> 
> 1) The original fix uses a spinlock per byt_gpio device / struct,
> additional testing has shown that this is not sufficient concurent
> accesses to 2 different GPIO banks also suffer from the same problem.
> 
> This commit fixes this by moving to a single global lock.
> 
> 2) The original fix did not add a lock around the register accesses in
> the suspend/resume handling.
> 
> Since pinctrl-baytrail.c is using normal suspend/resume handlers,
> interrupts are still enabled during suspend/resume handling. Nothing
> should be using the GPIOs when they are being taken down, _but_ the
> GPIOs themselves may still cause interrupts, which are likely to
> use (read) the triggering GPIO. So we need to protect against
> concurrent GPIO register accesses in the suspend/resume handlers too.
> 
> This commit fixes this by adding the missing spin_lock / unlock calls.
> ---end commit msg---
> 
> I was discussing the problem at my local hackerspace yesterday with
> someone who knows quite a bit low-level details about Intel CPUs.
> He said that on "big" core's all the GPIO islands sit behind the IOSF
> and that there is a bridge which make their registers look like regular
> mmio registers.
> 
> I wonder if the same is happening on BYT, that would point to an issue
> with concurent accesses being done through the IOSF bridge, which would
> explain why I've found that we need a single global lock for all GPIO
> islands (*).

> But that in turn would mean that we really need global lock
> for _all_ accesses done through the IOSF bridge, not just GPIO register
> accesses...  Which would explain why on some machines BYT has never been
> really stable with Linux.

I don't think so. The Cherriview seems inherited the same silicon issue (see
comment near to chv_lock), though it seems to be fixed for all new hardware
(Skylake+).

> Can someone at Intel with access to internal docs about BYT dig a bit into
> the docs and see if 1. this theory makes sense, 2. if it does if there us
> anything else behind the IOSF for which we also need to serialize accesses?

Unfortunately I have no time to dig into this right now.

> 
> Regards,
> 
> Hans
> 
> 
> 
> 
> *) It seems that the GPIO interrupt storm on the test device I hit
> this is really good in triggering this
> 

-- 
With Best Regards,
Andy Shevchenko