mmc: tmio_mmc_core: Locking questions

"Behme Dirk (CM/ESO2)" <dirk.behme@xxxxxxxxxxxx> · Fri, 16 Feb 2024 10:59:35 +0100

Hi,

using drivers/mmc/host/tmio_mmc_core.c (on 4.14.x Renesas BSP) what is 
quite similar to [1] we are trying to analyze a KASAN out of bounds 
report [2] in

...
mmc_request_done+0xcc/0x30c
tmio_mmc_reset_work+0x1fc/0x248
...

Trying to resolve mmc_request_done+0xcc we think this is the access to

mrq->cap_cmd_during_tfr

in mmc_command_done() called inline from mmc_request_done() with mrq 
becoming invalid. See [3].

Guessing about options why a wrong mrq might be used here, we have some 
questions regarding tmio_mmc_reset_work() [1] what is a delayed work 
handler (running asynchronously). Comparing that with similar delayed 
work handlers like bcm2835_timeout()/bcm2835_finish_request() [4] the 
(simplified) functions are

static void tmio_mmc_reset_work(struct work_struct *work)
{
	struct mmc_request *mrq;

	spin_lock_irqsave(&host->lock, flags);
	mrq = host->mrq;

	spin_unlock_irqrestore(&host->lock, flags);

	/* Ready for new calls */
	host->mrq = NULL;

	mmc_request_done(host->mmc, mrq);
}

static void bcm2835_timeout(struct work_struct *work)
{
	struct mmc_request *mrq;

	mutex_lock(&host->mutex);

	if (host->mrq) {
		cancel_delayed_work(&host->timeout_work);

		mrq = host->mrq;

		host->mrq = NULL;

		mmc_request_done(mmc_from_priv(host), mrq);		
	}

	mutex_unlock(&host->mutex);
}

Comparing this, the questions are:

* tmio uses irqsave spin locks, while the bcm2835 uses mutex for 
locking. Why is irqsave spin lock used here?

* In tmio the mmc_request_done() is *outside* the lock, while for 
bcm2835 its *inside* the lock protected section. Why does tmio doesn't 
lock the access to mmc_request_done()?

* How does tmio_mmc_reset_work() ensures that the content mrq points to 
and passed to mmc_request_done() is still the correct content and hasn't 
changed asynchronously in the background (due to being outside the lock)?

* Why does tmio doesn't use cancel_delayed_work()? Maybe because its 
ensured by the interrupt locking no further work is scheduled?

In sum looking at the KASAN report and the locking in 
tmio_mmc_reset_work() we have an uncomfortable feeling. 
bcm2835_timeout() looks much safer regarding the locking. Of course that 
is just a feeling and can't be proven. Therefore we want to ask here :)

Best regards

Dirk

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/mmc/host/tmio_mmc_core.c#n229

[2]

BUG: KASAN: stack-out-of-bounds in mmc_request_done+0xcc/0x30c
Read of size 1 at addr ffff8005df517738 by task kworker/4:1/2308

CPU: 4 PID: 2308 Comm: kworker/4:1 Tainted: G         C      4.14.327-ltsi
Hardware name: custom board based on r8a7796 (DT)
Workqueue: events tmio_mmc_reset_work
Call trace:
 dump_backtrace+0x0/0x1fc
 show_stack+0x14/0x1c
 dump_stack+0xcc/0x114
 print_address_description+0x54/0x238
 kasan_report+0x274/0x29c
 __asan_load1+0x24/0x50
 mmc_request_done+0xcc/0x30c
 tmio_mmc_reset_work+0x1fc/0x248
 process_one_work+0x324/0x6c0
 worker_thread+0x358/0x4d4
 kthread+0x1a4/0x1bc
 ret_from_fork+0x10/0x18

The buggy address belongs to the page:
page:ffff7e00177d45c0 pfn:61f517 refcount:0 mapcount:0 mapping: 
(null) index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffff7e00177d45e0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8005df517600: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 00
 ffff8005df517680: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8005df517700: 00 00 f1 f1 f1 f1 04 f2 f2 f2 00 00 00 00 00 00
                                        ^
 ffff8005df517780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8005df517800: 00 00 f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00
==================================================================
Disabling lock debugging due to kernel taint
------------[ cut here ]------------

[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/mmc/core/core.c#n113

[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/mmc/host/bcm2835.c#n821