On Thu, Jun 27, 2024 at 11:09:15AM -0600, Uday Shankar wrote: > When I say "behavior A + 2," I mean behavior A and behavior 2 at the > same time on the same ublk device. I still think this is not supported > with current ublk_drv, see below. > > > > the ublk server can "handle" the I/O error because during this time, > > > there is no ublk server and all decisions on how to handle I/O are made > > > by ublk_drv directly (based on configuration flags specified when the > > > device was created). > > > > > > If the ublk server created the device with UBLK_F_USER_RECOVERY, then > > > when the ublk server has crashed (and not restarted yet), I/Os issued by > > > the application will queue/hang until the ublk server comes back and > > > recovers the device, because the underlying request_queue is left in a > > > quiesced state. So in this case, behavior A is not possible. > > > > When ublk server is crashed, ublk_abort_requests() will be called to fail > > queued inflight requests. Meantime ubq->canceling is set to requeue > > new request instead of forwarding it to ublk server. > > > > So behavior A should be supported easily by failing request in > > ublk_queue_rq() if ubq->canceling is set. > > This argument only works for devices created without > UBLK_F_USER_RECOVERY. If UBLK_F_USER_RECOVERY is set, then the > request_queue for the device is left in a quiesced state and so I/Os > will not even get to ublk_queue_rq. See the following as proof (using a > build of ublksrv master): I meant that the following one-line patch may address your issue: diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 4e159948c912..a89240f4f7b0 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1068,7 +1068,7 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq, struct request *rq) { /* We cannot process this rq so just requeue it. */ - if (ublk_queue_can_use_recovery(ubq)) + if (ublk_queue_can_use_recovery_reissue(ubq)) blk_mq_requeue_request(rq, false); else blk_mq_end_request(rq, BLK_STS_IOERR); Thanks, Ming