Hi Marek, On 10/22/24 03:38, Marek Vasut wrote: > The bus locking in this driver is broken and produces subtle race > condition with ksdioirqd and its mmc_claim_host()/mmc_release_host() > usage in case of SDIO bus. Rework the locking to avoid this race > condition. > > The problem is the hif_cs mutex used in acquire_bus()/release_bus(), > which makes it look like calling acquire_bus() results in exclusive > access to the bus, but that is not true for SDIO bus. For SDIO bus, > to obtain exclusive access (any access, really), it is necessary to > call sdio_claim_host(), which is a wrapper around mmc_claim_host(), > which does its own locking. The acquire_bus() does not do that, but > the SDIO interface implementation does call sdio_claim_host() and > sdio_release_host() every single command, which is problematic. To > make things worse, wilc_sdio_interrupt() implementation called from > ksdioirqd first calls sdio_release_host(), then interrupt handling > and finally sdio_claim_host(). > > The core problem is that sdio_claim_host() cannot be done per command, > but has to be done per register/data IO which consists of multiple > commands. Is it really true ? What makes you say that we can not perform multiple operations under the same exclusive sdio section ? Usually the WILC register read/write consists of 3x CMD52 > to push in CSA pointer address and 1x CMD53 to read/write data to that > address. Most other accesses are also composed of multiple commands. > > Currently, if ksdioirqd wakes up and attempts to read SDIO_CCCR_INTx > to get pending SDIO IRQs in sdio_get_pending_irqs(), it can easily > perform that transfer between two consecutive CMD52 which are pushing > in the CSA pointer address and possibly disrupt the WILC operation. > This is undesired behavior. I agree about the observation, and then I disagree about the statement above on sdio_claim_host/sdio_release_host not meant to be used for multiple commands. I see plenty of sdio wireless drivers performing multiple sdio operations under the same sdio exclusive bus access section, either explicitely in their code, or through a sdio dedicated helper (eg: sdio_enable_func, sdio_disable_func). But more concerns below > > Rework the locking. > > Introduce new .hif_claim/.hif_release callbacks which implement bus > specific locking. Lock/unlock SDIO bus access using sdio_claim_host() > and sdio_release_host(), lock/unlock SPI bus access using the current > hif_cs mutex moved purely into the spi.c interface. Make acquire_bus() > and release_bus() call the .hif_claim/.hif_release() callbacks and do > not access the hif_cs mutex from there at all. > > Remove any SDIO bus locking used directly in commands and the broken > SDIO bus unlocking in wilc_sdio_interrupt(), this is no longer needed. > Fix up SDIO initialization code which newly needs sdio_claim_host() > and sdio_release_host(), since it cannot depend on the locking being > done per-command anymore. > > Signed-off-by: Marek Vasut <marex@xxxxxxx> [...] > > -static void wilc_sdio_interrupt(struct sdio_func *func) > +static void wilc_sdio_claim(struct wilc *wilc) > +{ > + struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev); > + > + sdio_claim_host(func); > +} > + > +static void wilc_sdio_release(struct wilc *wilc) > { > + struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev); > + > sdio_release_host(func); > +} So with this series, we end up with some bus-specific operations needing some locking, but is now up to the upper layer to control this locking. This feels wrong. The driver has a dedicated sdio layer, so if we need some locking for sdio-specific operations, it should be handled autonomously by the sdio layer, right ? [...] > static int wilc_wlan_cfg_commit(struct wilc_vif *vif, int type, > diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.h b/drivers/net/wireless/microchip/wilc1000/wlan.h > index b9e7f9222eadd..ade2db95e8a0f 100644 > --- a/drivers/net/wireless/microchip/wilc1000/wlan.h > +++ b/drivers/net/wireless/microchip/wilc1000/wlan.h > @@ -403,6 +403,8 @@ struct wilc_hif_func { > void (*disable_interrupt)(struct wilc *nic); > int (*hif_reset)(struct wilc *wilc); > bool (*hif_is_init)(struct wilc *wilc); > + void (*hif_claim)(struct wilc *wilc); > + void (*hif_release)(struct wilc *wilc); So IIUC, your series push the hif_cs lock into each bus layer of the driver, remove any explicit call to bus-specific locking mechanism from those layers, and makes the upper layer control the locking. As mentioned above, I don't understand why those layers can not manage the bus-specific locking by themselves (which would be a big win for the upper layer). For SDIO specifically, I feel like letting the layer handle those claim/release would even allow to remove this hif_cs mutex (but we may still need a lock for SPI side) But I may be missing something, so feel free to prove me wrong. -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com