NULL deref crash in bfq_add_bfqq_busy

Khazhy Kumykov <khazhy@xxxxxxxxxx> · Tue, 1 Nov 2022 16:16:09 -0700

Still poking around here, I think my previous emails were leading me
down a dead end, at least w.r.t. unsafe locking. The one code path I
identified below (blk_bio_list_merge() via __blk_mq_sched_bio_merge())
won't actually get hit since we'll instead call bfq_bio_merge().

The crashes I'm seeing have been in the read side (stacks below)

[160595.656560]  bfq_add_bfqq_busy+0x110/0x1ec
[160595.661142]  bfq_add_request+0x6bc/0x980
[160595.666602]  bfq_insert_request+0x8ec/0x1240
[160595.671762]  bfq_insert_requests+0x58/0x9c
[160595.676420]  blk_mq_sched_insert_request+0x11c/0x198
[160595.682107]  blk_mq_submit_bio+0x270/0x62c
[160595.686759]  __submit_bio_noacct_mq+0xec/0x178
[160595.691926]  submit_bio+0x120/0x184
[160595.695990]  ext4_mpage_readpages+0x77c/0x7c8
[160595.701026]  ext4_readpage+0x60/0xb0
[160595.705158]  filemap_read_page+0x54/0x114
[160595.711961]  filemap_fault+0x228/0x5f4
[160595.716272]  do_read_fault+0xe0/0x1f0
[160595.720487]  do_fault+0x40/0x1c8

or

[28497.344552]  bfq_add_bfqq_busy+0x110/0x1ec
[28497.348787]  bfq_add_request+0x6bc/0x980
[28497.352845]  bfq_insert_request+0x8ec/0x1240
[28497.357265]  bfq_insert_requests+0x58/0x9c
[28497.361503]  blk_mq_sched_insert_requests+0x8c/0x180
[28497.366647]  blk_mq_flush_plug_list+0x15c/0x1e0
[28497.371376]  blk_flush_plug_list+0xf0/0x124
[28497.375877]  blk_finish_plug+0x34/0x48
[28497.379846]  read_pages+0x21c/0x288
[28497.383510]  page_cache_ra_unbounded+0x1f0/0x24c
[28497.388346]  do_page_cache_ra+0x48/0x54
[28497.392388]  do_sync_mmap_readahead+0x190/0x1e0
[28497.397150]  filemap_fault+0x150/0x5f4
[28497.401111]  do_read_fault+0xe0/0x1f0
[28497.404948]  do_fault+0x40/0x1c8

In the crashes I'm looking at, it looks like the inconsistent
bfq_queue is oom_bfqq (where waker_bfqq is NULL, but woken_list_node
is hashed)

I'm looking at bfq_init_rq(), where we set bfqq->waker_bfqq
(potentially to NULL), and only hlist_add_head, no hlist_del_init. It
looks like the logic here is, bfq_get_bfqq_handle_split is supposed to
return a freshly allocated bfq_queue.

However, it can also return oom_bfqq, which may already be on a woken
list - then we proceed to set waker_bfqq without deleting woken_list
node, which can result in the inconsistency seen in the crash. wdyt?

bfqq = bfq_split_bfqq(bic, bfqq);
split = true;

if (!bfqq) {
    bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio,
                        true, is_sync,
                        NULL);  # This can also return oom_bfqq
    bfqq->waker_bfqq = old_bfqq->waker_bfqq;  # this can be set to null
    bfqq->tentative_waker_bfqq = NULL;

an aside, which looks at the async case, since I'd like to understand
this code better.
---

The call to bfq_split_bfqq() doesn't make sense to me - since for
bfq_get_bfqq_handle_split() et. al. we pass the is_sync parameter, but
bfq_split_bfqq() only ever clears bfqq for sync=true. My understanding
seems like this is intended to be: we decide to split, we call
bfq_split_bfqq() which sets bic->bfqq[is_sync] to NULL, then
bfq_get_bfqq_handle_split() sees that it's null and allocates a brand
new queue. But for async, bfq_split_bfqq will return NULL, but
bic->bfqq[0] was never cleared, so bfq_get_bfqq_handle_split() will
see that, and return that already existing bfqq. We'll then modify the
not-freshly-allocated bfqq.

Should bfq_split_bfqq() here have an is_sync param, and clear the
corresponding bic->bfqq?
---

On Mon, Oct 31, 2022 at 10:05 PM Khazhy Kumykov <khazhy@xxxxxxxxxxxx> wrote:
>
> On Mon, Oct 31, 2022 at 9:37 PM Khazhy Kumykov <khazhy@xxxxxxxxxxxx> wrote:
> >
> > I'm investigating a NULL deref crash in bfq_add_bfqq_busy(), wherein
> > bfqq->woken_list_node is hashed, but bfqq->waker_bfqq is NULL - which
> > seems inconsistent per my reading of the code.
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature