On Tue, Jan 22, 2019 at 5:13 AM Florian Stecker <m19@xxxxxxxxxxxxxxxxx> wrote: > > Hi everyone, > > on my laptop, I am experiencing occasional hangs of applications during > fsync(), which are sometimes up to 30 seconds long. I'm using a BTRFS > which spans two partitions on the same SSD (one of them used to contain > a Windows, but I removed it and added the partition to the BTRFS volume > instead). Also, the problem only occurs when an I/O scheduler > (mq-deadline) is in use. I'm running kernel version 4.20.3. > > From what I understand so far, what happens is that a sync request > fails in the SCSI/ATA layer, in ata_std_qc_defer(), because it is a > "Non-NCQ command" and can not be queued together with other commands. > This propagates up into blk_mq_dispatch_rq_list(), where the call > > ret = q->mq_ops->queue_rq(hctx, &bd); > > returns BLK_STS_DEV_RESOURCE. Later in blk_mq_dispatch_rq_list(), there > is the piece of code > > needs_restart = blk_mq_sched_needs_restart(hctx); > if (!needs_restart || > (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) > blk_mq_run_hw_queue(hctx, true); > else if (needs_restart && (ret == BLK_STS_RESOURCE)) > blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY); > > which restarts the queue after a delay if BLK_STS_RESOURCE was returned, > but somehow not for BLK_STS_DEV_RESOURCE. Instead, nothing happens and > fsync() seems to hang until some other process wants to do I/O. > > So if I do > > - else if (needs_restart && (ret == BLK_STS_RESOURCE)) > + else if (needs_restart && (ret == BLK_STS_RESOURCE || ret == > BLK_STS_DEV_RESOURCE)) > > it fixes my problem. But was there a reason why BLK_STS_DEV_RESOURCE was > treated differently that BLK_STS_RESOURCE here? Please see the comment: /* * BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if * device related resources are unavailable, but the driver can guarantee * that the queue will be rerun in the future once resources become * available again. This is typically the case for device specific * resources that are consumed for IO. If the driver fails allocating these * resources, we know that inflight (or pending) IO will free these * resource upon completion. * * This is different from BLK_STS_RESOURCE in that it explicitly references * a device specific resource. For resources of wider scope, allocation * failure can happen without having pending IO. This means that we can't * rely on request completions freeing these resources, as IO may not be in * flight. Examples of that are kernel memory allocations, DMA mappings, or * any other system wide resources. */ #define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) > > In any case, it seems wrong to me that ret is used here at all, as it > just contains the return value of the last request in the list, and > whether we rerun the queue should probably not only depend on the last > request? > > Can anyone of the experts tell me whether this makes sense or I got > something completely wrong? Sounds a bug in SCSI or ata driver. I remember there is hole in SCSI wrt. returning BLK_STS_DEV_RESOURCE, but I never get lucky to reproduce it. scsi_queue_rq(): ...... case BLK_STS_RESOURCE: if (atomic_read(&sdev->device_busy) || scsi_device_blocked(sdev)) ret = BLK_STS_DEV_RESOURCE; All in-flight request may complete between reading 'sdev->device_busy' and setting ret as 'BLK_STS_DEV_RESOURCE', then this IO hang may be triggered. Thanks, Ming Lei