Re: [PATCH BUGFIX IMPROVEMENT V2 5/7] block, bfq: detect wakers and unconditionally inject their I/O

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, Jun 24, 2019 at 10:13 PM Paolo Valente <paolo.valente@xxxxxxxxxx> wrote:
>
> A bfq_queue Q may happen to be synchronized with another
> bfq_queue Q2, i.e., the I/O of Q2 may need to be completed for Q to
> receive new I/O. We call Q2 "waker queue".
>
> If I/O plugging is being performed for Q, and Q is not receiving any
> more I/O because of the above synchronization, then, thanks to BFQ's
> injection mechanism, the waker queue is likely to get served before
> the I/O-plugging timeout fires.
>
> Unfortunately, this fact may not be sufficient to guarantee a high
> throughput during the I/O plugging, because the inject limit for Q may
> be too low to guarantee a lot of injected I/O. In addition, the
> duration of the plugging, i.e., the time before Q finally receives new
> I/O, may not be minimized, because the waker queue may happen to be
> served only after other queues.
>
> To address these issues, this commit introduces the explicit detection
> of the waker queue, and the unconditional injection of a pending I/O
> request of the waker queue on each invocation of
> bfq_dispatch_request().
>
> One may be concerned that this systematic injection of I/O from the
> waker queue delays the service of Q's I/O. Fortunately, it doesn't. On
> the contrary, next Q's I/O is brought forward dramatically, for it is
> not blocked for milliseconds.
>
> Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@xxxxxxxxxxxxx>
> Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@xxxxxxxxxxxxx>
> Signed-off-by: Paolo Valente <paolo.valente@xxxxxxxxxx>
> ---
>  block/bfq-iosched.c | 270 ++++++++++++++++++++++++++++++++++++++------
>  block/bfq-iosched.h |  25 +++-
>  2 files changed, 261 insertions(+), 34 deletions(-)

FYI that there is some evidence that this commit, which landed as
commit 13a857a4c4e8 ("block, bfq: detect wakers and unconditionally
inject their I/O"), is causing use-after-frees, as identified by using
slub_debug and/or KASAN.

If folks are willing to follow a link to the Chrome OS bug tracker,
you can find more details starting at:

https://bugs.chromium.org/p/chromium/issues/detail?id=931295#c46


The most relevant part from that discussion so far is that one crash
can be seen in bfq_exit_icq_bfqq():

/* reset waker for all queues in woken list */
hlist_for_each_entry_safe(item, n, &bfqq->woken_list, woken_list_node) {
    item->waker_bfqq = NULL;
    bfq_clear_bfqq_has_waker(item);
==> hlist_del_init(&item->woken_list_node);
}

...where "item" has already been freed.  Hopefully Paolo has some
ideas here since I'm not sure I'll be able to do any more detailed
debugging in the short term.  Happy to throw on a test patch and
re-start tests though.

-Doug



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux