When I say "behavior A + 2," I mean behavior A and behavior 2 at the same time on the same ublk device. I still think this is not supported with current ublk_drv, see below. > > the ublk server can "handle" the I/O error because during this time, > > there is no ublk server and all decisions on how to handle I/O are made > > by ublk_drv directly (based on configuration flags specified when the > > device was created). > > > > If the ublk server created the device with UBLK_F_USER_RECOVERY, then > > when the ublk server has crashed (and not restarted yet), I/Os issued by > > the application will queue/hang until the ublk server comes back and > > recovers the device, because the underlying request_queue is left in a > > quiesced state. So in this case, behavior A is not possible. > > When ublk server is crashed, ublk_abort_requests() will be called to fail > queued inflight requests. Meantime ubq->canceling is set to requeue > new request instead of forwarding it to ublk server. > > So behavior A should be supported easily by failing request in > ublk_queue_rq() if ubq->canceling is set. This argument only works for devices created without UBLK_F_USER_RECOVERY. If UBLK_F_USER_RECOVERY is set, then the request_queue for the device is left in a quiesced state and so I/Os will not even get to ublk_queue_rq. See the following as proof (using a build of ublksrv master): # ./ublk add -t loop -f file -r 1 dev id 0: nr_hw_queues 1 queue_depth 128 block size 4096 dev_capacity 2097152 max rq size 524288 daemon pid 244608 flags 0x4a state LIVE ublkc: 240:0 ublkb: 259:0 owner: 0:0 queue 0: tid 244610 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ) target {"backing_file":"file","dev_size":1073741824,"direct_io":1,"name":"loop","type":1} # kill -9 244608 # dd if=/dev/urandom of=/dev/ublkb0 count=1 bs=4096 oflag=direct (hung) # ps aux | grep " D" root 244626 0.0 0.0 5620 1880 pts/0 D+ 16:57 0:00 dd if=/dev/urandom of=/dev/ublkb0 count=1 bs=4096 oflag=direct root 244656 0.0 0.0 6408 2188 pts/1 S+ 16:58 0:00 grep --color=auto D # cat /proc/244626/stack [<0>] submit_bio_wait+0x63/0x90 [<0>] __blkdev_direct_IO_simple+0xd9/0x1e0 [<0>] blkdev_write_iter+0x1b4/0x230 [<0>] vfs_write+0x2ae/0x3d0 [<0>] ksys_write+0x4f/0xc0 [<0>] do_syscall_64+0x5d/0x160 [<0>] entry_SYSCALL_64_after_hwframe+0x4b/0x53 # cat /sys/kernel/debug/block/ublkb0/state SAME_COMP|NONROT|IO_STAT|INIT_DONE|STATS|REGISTERED|QUIESCED|NOWAIT|SQ_SCHED Therefore, in order to obtain behavior A with current ublk_drv, one must not set UBLK_F_USER_RECOVERY. > > > > If the ublk server created the device without UBLK_F_USER_RECOVERY, then > > when the ublk server has crashed (and not restarted yet), I/Os issued by > > the application will immediately error (since in this case, ublk will > > call del_gendisk). However, when the ublk server restarts, it cannot > > recover the existing ublk device - the disk has been deleted and the > > ublk device is in state UBLK_S_DEV_DEAD from which recovery is not > > permitted. So in this case, behavior 2 is not possible. > > UBLK_F_USER_RECOVERY is supposed for supporting to recover device, and > if this flag isn't enabled, we don't support the feature simply, so > looks behavior 2 isn't one valid case, is it? Sure, so we're in agreement that recovery is impossible if UBLK_F_USER_RECOVERY is not set. So: - To get behavior A, UBLK_F_USER_RECOVERY must be unset - To get behavior 2, UBLK_F_USER_RECOVERY must be set Hence, having behavior A and behavior 2, at the same time, on the same device, would require UBLK_F_USER_RECOVERY to be both set and unset when that device is created. Obviously that's impossible.