> Il giorno 14 feb 2018, alle ore 16:19, Jens Axboe <axboe@xxxxxxxxx> ha scritto: > > On 2/14/18 1:56 AM, Paolo Valente wrote: >> >> >>> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@xxxxxx> ha scritto: >>> >>> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote: >>>> >>>> And _of course_, roughly two minutes later, IO stalled. >>> >>> P.S. >>> >>> crash> bt 19117 >>> PID: 19117 TASK: ffff8803d2dcd280 CPU: 7 COMMAND: "kworker/7:2" >>> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18 >>> #1 [ffff8803f7207c40] schedule at ffffffff81596422 >>> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832 >>> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e >>> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc >>> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a >>> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674 >>> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod] >>> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod] >>> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod] >>> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85 >>> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867 >>> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127 >>> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48 >>> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5 >>> crash> >> >> This has evidently to do with tag pressure. I've looked for a way to >> easily reduce the number of tags online, so as to put your system in >> the bad spot deterministically. But at no avail. Does anyone know a >> way to do it? > > The key here might be that it's not a regular file system request, > which I'm sure bfq probably handles differently. So it's possible > that you are slowly leaking those tags, and we end up in this > miserable situation after a while. > Could you elaborate more on this? My mental model of bfq hooks in this respect is that they do only side operations, which AFAIK cannot block the putting of a tag. IOW, tag getting and putting is done outside bfq, regardless of what bfq does with I/O requests. Is there a flaw in this? In any case, is there any flag in or the like, in requests passed to bfq, that I could make bfq check, to raise some warning? Thanks, Paolo > -- > Jens Axboe