Re: [PATCH] wifi: wilc1000: Rework bus locking

Marek Vasut <marex@xxxxxxx> · Tue, 22 Oct 2024 15:19:19 +0200

On 10/22/24 12:43 PM, Alexis Lothoré wrote:
Hi Marek,

Hi,

On 10/22/24 03:38, Marek Vasut wrote:
The bus locking in this driver is broken and produces subtle race
condition with ksdioirqd and its mmc_claim_host()/mmc_release_host()
usage in case of SDIO bus. Rework the locking to avoid this race
condition.

The problem is the hif_cs mutex used in acquire_bus()/release_bus(),
which makes it look like calling acquire_bus() results in exclusive
access to the bus, but that is not true for SDIO bus. For SDIO bus,
to obtain exclusive access (any access, really), it is necessary to
call sdio_claim_host(), which is a wrapper around mmc_claim_host(),
which does its own locking. The acquire_bus() does not do that, but
the SDIO interface implementation does call sdio_claim_host() and
sdio_release_host() every single command, which is problematic. To
make things worse, wilc_sdio_interrupt() implementation called from
ksdioirqd first calls sdio_release_host(), then interrupt handling
and finally sdio_claim_host().

The core problem is that sdio_claim_host() cannot be done per command,
but has to be done per register/data IO which consists of multiple
commands.

Is it really true ? What makes you say that we can not perform multiple
operations under the same exclusive sdio section ?

What I am trying to say is this:

With current code, this can happen, which is not good, because transfers 
from multiple threads can be interleaved and interfere with each other:

thread 1                         thread2
do_some_higher_level_op() {
 ...
 read_register_0x3b0000() {
  claim_bus
  CMD52 0x00
  release bus                    ksdioirqd() {
                                   claim_bus
                                   CMD52 0x0f, lets read SDIO_CCCR_INTx
                                   release_bus
  claim bus                      }
  CMD52 0x00
  release_bus
  claim_bus
  CMD52 0x3b
  release_bus
  claim_bus
  CMD53 lets read data
  release_bus
 }
 ...
}

What should happen is either:

thread 1                         thread2
                                 ksdioirqd() { // option 1
                                   claim_bus
                                   CMD52 0x0f, lets read SDIO_CCCR_INTx
                                   release_bus
                                 }
do_some_higher_level_op() {
 claim_bus
 ...
 read_register_0x3b0000 {
  CMD52 0x00
  CMD52 0x00
  CMD52 0x3b
  CMD53 lets read data
 }
 ...
 read_another_register()
 ...
 release_bus
}
                                 ksdioirqd() { // option 2
                                   claim_bus
                                   CMD52 0x0f, lets read SDIO_CCCR_INTx
                                   release_bus
                                 }

That's what this patch implements, to avoid the interference.

Maybe I should include the infographics? Or reword this somehow?

Usually the WILC register read/write consists of 3x CMD52
to push in CSA pointer address and 1x CMD53 to read/write data to that
address. Most other accesses are also composed of multiple commands.

Currently, if ksdioirqd wakes up and attempts to read SDIO_CCCR_INTx
to get pending SDIO IRQs in sdio_get_pending_irqs(), it can easily
perform that transfer between two consecutive CMD52 which are pushing
in the CSA pointer address and possibly disrupt the WILC operation.
This is undesired behavior.

I agree about the observation, and then I disagree about the statement above on
sdio_claim_host/sdio_release_host not meant to be used for multiple commands.

I think we have a misunderstanding here, see above.

I see plenty of sdio wireless drivers performing multiple sdio operations under
the same sdio exclusive bus access section, either explicitely in their code, or
through a sdio dedicated helper (eg: sdio_enable_func, sdio_disable_func).

But more concerns below

Rework the locking.

Introduce new .hif_claim/.hif_release callbacks which implement bus
specific locking. Lock/unlock SDIO bus access using sdio_claim_host()
and sdio_release_host(), lock/unlock SPI bus access using the current
hif_cs mutex moved purely into the spi.c interface. Make acquire_bus()
and release_bus() call the .hif_claim/.hif_release() callbacks and do
not access the hif_cs mutex from there at all.

Remove any SDIO bus locking used directly in commands and the broken
SDIO bus unlocking in wilc_sdio_interrupt(), this is no longer needed.
Fix up SDIO initialization code which newly needs sdio_claim_host()
and sdio_release_host(), since it cannot depend on the locking being
done per-command anymore.

Signed-off-by: Marek Vasut <marex@xxxxxxx>

[...]

  
-static void wilc_sdio_interrupt(struct sdio_func *func)
+static void wilc_sdio_claim(struct wilc *wilc)
+{
+	struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
+
+	sdio_claim_host(func);
+}
+
+static void wilc_sdio_release(struct wilc *wilc)
  {
+	struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
+
  	sdio_release_host(func);
+}

So with this series, we end up with some bus-specific operations needing some
locking, but is now up to the upper layer to control this locking. This feels
wrong.

It always was the upper layer (wlan.c) that attempted to do bus locking, 
except it was incomplete. The acquire_bus()/release_bus() primitives 
seems to be an attempt at serializing bus access across multiple 
operations (which really boils down to multiple SPI or SDIO commands).

The problem is, acquire_bus()/release_bus() does not really work, 
because it does not prevent e.g. ksdioirqd from inserting a command on 
the SDIO bus. SDIO (or really, mmc framework) has its own way of doing 
bus locking, that's the sdio_claim_host()/sdio_release_host(), SPI does 
have spi_bus_lock()/spi_bus_unlock() which I should use in V2.

Those sdio_claim_host()/sdio_release_host() and 
spi_bus_lock()/spi_bus_unlock() should be called in 
acquire_bus()/release_bus() to correctly serialize bus access across 
multiple operations. That will also eliminate the custom and not really 
functional hif_cs mutex.

The driver has a dedicated sdio layer, so if we need some locking for
sdio-specific operations, it should be handled autonomously by the sdio layer,
right ?

Not quite, I don't think the intention was to let anything communicate 
with the WILC device within block "protected" by 
acquire_bus()/release_bus() pair. That's why I believe this is where bus 
lock and unlock should happen too.

[...]

  static int wilc_wlan_cfg_commit(struct wilc_vif *vif, int type,

diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.h b/drivers/net/wireless/microchip/wilc1000/wlan.h
index b9e7f9222eadd..ade2db95e8a0f 100644
--- a/drivers/net/wireless/microchip/wilc1000/wlan.h
+++ b/drivers/net/wireless/microchip/wilc1000/wlan.h
@@ -403,6 +403,8 @@ struct wilc_hif_func {
  	void (*disable_interrupt)(struct wilc *nic);
  	int (*hif_reset)(struct wilc *wilc);
  	bool (*hif_is_init)(struct wilc *wilc);
+	void (*hif_claim)(struct wilc *wilc);
+	void (*hif_release)(struct wilc *wilc);

So IIUC, your series push the hif_cs lock into each bus layer of the driver,
remove any explicit call to bus-specific locking mechanism from those layers,
and makes the upper layer control the locking. As mentioned above, I don't
understand why those layers can not manage the bus-specific locking by
themselves (which would be a big win for the upper layer).

Because of acquire_bus()/release_bus() which I think is an attempt to 
serialize bus access across multiple complex operations (=commands sent 
to the card), see above.

For SDIO specifically, I feel like letting the layer handle those claim/release
would even allow to remove this hif_cs mutex (but we may still need a lock for
SPI side)

But I may be missing something, so feel free to prove me wrong.
With spi_bus_lock()/unlock() we can actually dispose of the hif_cs mutex 
altogether.