Re: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 03-11-22 11:51:15, Yu Kuai wrote:
> Hi,
> 
> 在 2022/11/03 11:05, Khazhy Kumykov 写道:
> > On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> > > 
> > > Hi,
> > > 
> > > 在 2022/11/03 9:39, Khazhismel Kumykov 写道:
> > > > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> > > > but woken_list_node still being hashed. This would happen when
> > > > bfq_init_rq() expects a brand new allocated queue to be returned from
> > > 
> > >   From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if
> > > 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq'
> > > from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused
> > > here...
> > There's two calls for bfq_get_bfqq_handle_split in this function - the
> > second one is after the check you mentioned, and is the problematic
> > one.
> Yes, thanks for the explanation. Now I understand how the problem
> triggers.
> 
> > > 
> > > > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> > > > without resetting woken_list_node. Since we can always return oom_bfqq
> > > > when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> > > > We must either reset woken_list_node, or avoid setting woken_list at all
> > > > for oom_bfqq - opt to do the former.
> > > 
> > > Once oom_bfqq is used, I think the io is treated as issued from root
> > > group. Hence I don't think it's necessary to set woken_list or
> > > waker_bfqq for oom_bfqq.
> > Ack, I was wondering what's right here since, evidently, *someone* had
> > already set oom_bfqq->waker_bfqq to *something* (although... maybe it
> > was an earlier init_rq). But maybe it's better to do nothing if we
> > *know* it's oom_bfqq.
> 
> I need to have a check how oom_bfqq get involved with waker_bfqq, and
> then see if it's reasonable.
> 
> Probably Jan and Paolo will have better view on this.

Thanks for the CC Kuai and thanks to Khazy for spotting the bug. The
oom_bfqq is just a fallback bfqq and as such it should be extempted from
all special handling like waker detection etc. All this stuff is just for
optimizing performance and when we are OOM, we have far larger troubles
than to optimize performance.

So how I think we should really fix this is that we extempt oom_bfqq from
waker detection in bfq_check_waker() by adding:

	bfqq == bfqd->oom_bfqq ||
 	bfqd->last_completed_rq_bfq == bfqd->oom_bfqq)

to the initial check and then also if bfq_get_bfqq_handle_split() returns
oom_bfqq we should just skip carrying over the waker information.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux