Re: [PATCH 4/5] block: drain file system I/O on del_gendisk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 29, 2021 at 04:17:01PM +0800, Ming Lei wrote:

[full quote deleted]

> Draining request won't fix the problem completely:
> 
> 1) blk-mq dispatch code may still be in-progress after q_usage_counter
> becomes zero, see the story in 662156641bc4 ("block: don't drain in-progress dispatch in
> blk_cleanup_queue()")

That commit does not have a good explanation on what it actually fixed.

> 2) elevator code / blkcg code may still be called after blk_cleanup_queue(), such
> as kyber, trace_kyber_latency()(q->disk is referred) is called in kyber's timer
> handler, and the timer is deleted via del_timer_sync() via kyber_exit_sched()
> from blk_release_queue().

Yes.  There's two things we can do here:

 - stop using the dev_t in tracing a request_queue
 - exit the I/O schedules in del_gendisk, because they are only used
   for file system I/O that requires the gendisk anyway

we'll probably want both eventually.

> 
> > +
> > +	rq_qos_exit(q);
> > +	blk_sync_queue(q);
> > +	blk_flush_integrity();
> > +	/*
> > +	 * Allow using passthrough request again after the queue is torn down.
> > +	 */
> > +	blk_mq_unfreeze_queue(q);
> 
> Again, one FS bio is still possible to enter queue now: submit_bio_checks()
> is done before set_capacity(0), and submitted after blk_mq_unfreeze_queue()
> returns.

Not with the new patch 1 in this series.

Jens - can you take a look at the series that fixes the crashes people
are sending while I'm looking at the rest of the corner cases?



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux