Hello Marek, On 10/22/24 15:19, Marek Vasut wrote: > On 10/22/24 12:43 PM, Alexis Lothoré wrote: >> Hi Marek, > > Hi, > >> On 10/22/24 03:38, Marek Vasut wrote: >>> The bus locking in this driver is broken and produces subtle race >>> condition with ksdioirqd and its mmc_claim_host()/mmc_release_host() >>> usage in case of SDIO bus. Rework the locking to avoid this race >>> condition. >>> >>> The problem is the hif_cs mutex used in acquire_bus()/release_bus(), >>> which makes it look like calling acquire_bus() results in exclusive >>> access to the bus, but that is not true for SDIO bus. For SDIO bus, >>> to obtain exclusive access (any access, really), it is necessary to >>> call sdio_claim_host(), which is a wrapper around mmc_claim_host(), >>> which does its own locking. The acquire_bus() does not do that, but >>> the SDIO interface implementation does call sdio_claim_host() and >>> sdio_release_host() every single command, which is problematic. To >>> make things worse, wilc_sdio_interrupt() implementation called from >>> ksdioirqd first calls sdio_release_host(), then interrupt handling >>> and finally sdio_claim_host(). >>> >>> The core problem is that sdio_claim_host() cannot be done per command, >>> but has to be done per register/data IO which consists of multiple >>> commands. >> >> Is it really true ? What makes you say that we can not perform multiple >> operations under the same exclusive sdio section ? > > What I am trying to say is this: > > With current code, this can happen, which is not good, because transfers from > multiple threads can be interleaved and interfere with each other: > > thread 1 thread2 > do_some_higher_level_op() { > ... > read_register_0x3b0000() { > claim_bus > CMD52 0x00 > release bus ksdioirqd() { > claim_bus > CMD52 0x0f, lets read SDIO_CCCR_INTx > release_bus > claim bus } > CMD52 0x00 > release_bus > claim_bus > CMD52 0x3b > release_bus > claim_bus > CMD53 lets read data > release_bus > } > ... > } > > What should happen is either: > > thread 1 thread2 > ksdioirqd() { // option 1 > claim_bus > CMD52 0x0f, lets read SDIO_CCCR_INTx > release_bus > } > do_some_higher_level_op() { > claim_bus > ... > read_register_0x3b0000 { > CMD52 0x00 > CMD52 0x00 > CMD52 0x3b > CMD53 lets read data > } > ... > read_another_register() > ... > release_bus > } > ksdioirqd() { // option 2 > claim_bus > CMD52 0x0f, lets read SDIO_CCCR_INTx > release_bus > } > > That's what this patch implements, to avoid the interference. > > Maybe I should include the infographics? Or reword this somehow? What I may have misunderstood is your first sentence ("sdio_claim_host() cannot be done per command, but has to be done per register/data IO which consists of multiple commands", especially command VS reg/data io), but your graph clarified it for me, thanks, so in the end we agree on this :) That may just be me having poorly interpreted, so no need to add the graphs to the commit [...] >>> static int wilc_wlan_cfg_commit(struct wilc_vif *vif, int type, >>> diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.h b/drivers/net/ >>> wireless/microchip/wilc1000/wlan.h >>> index b9e7f9222eadd..ade2db95e8a0f 100644 >>> --- a/drivers/net/wireless/microchip/wilc1000/wlan.h >>> +++ b/drivers/net/wireless/microchip/wilc1000/wlan.h >>> @@ -403,6 +403,8 @@ struct wilc_hif_func { >>> void (*disable_interrupt)(struct wilc *nic); >>> int (*hif_reset)(struct wilc *wilc); >>> bool (*hif_is_init)(struct wilc *wilc); >>> + void (*hif_claim)(struct wilc *wilc); >>> + void (*hif_release)(struct wilc *wilc); >> >> So IIUC, your series push the hif_cs lock into each bus layer of the driver, >> remove any explicit call to bus-specific locking mechanism from those layers, >> and makes the upper layer control the locking. As mentioned above, I don't >> understand why those layers can not manage the bus-specific locking by >> themselves (which would be a big win for the upper layer). > > Because of acquire_bus()/release_bus() which I think is an attempt to serialize > bus access across multiple complex operations (=commands sent to the card), see > above. Taking a further look at some examples in the driver, I see that indeed the "scope" of acquire_bus/release_bus is larger than simple bus operations. So I withdraw my proposal which was wrong. Thanks, Alexis -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com