On Tue, Nov 05, 2024 at 03:03:06PM +0100, Kevin Wolf wrote: > Am 29.10.2024 um 02:19 hat Ming Lei geschrieben: > > nbd driver sends request header and payload with multiple call of > > sock_sendmsg, and partial sending can't be avoided. However, nbd driver > > returns BLK_STS_RESOURCE to block core in this situation. This way causes > > one issue: request->tag may change in the next run of nbd_queue_rq(), but > > the original old tag has been sent as part of header cookie, this way > > confuses nbd driver reply handling, since the real request can't be > > retrieved any more with the obsolete old tag. > > > > Fix it by retrying sending directly in per-socket work function, > > meantime return BLK_STS_OK to block layer core. > > > > Cc: vincent.chen@xxxxxxxxxx > > Cc: Leon Schuermann <leon@is.currently.online> > > Cc: Bart Van Assche <bvanassche@xxxxxxx> > > Reported-by: Kevin Wolf <kwolf@xxxxxxxxxx> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > > @@ -770,6 +798,14 @@ static blk_status_t nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, > > return BLK_STS_OK; > > > > requeue: > > + /* > > + * Can't requeue in case we are dealing with partial send > > + * > > + * We must run from pending work function. > > + * */ > > + if (test_bit(NBD_CMD_PARTIAL_SEND, &cmd->flags)) > > + return BLK_STS_OK; > > + > > /* retry on a different socket */ > > dev_err_ratelimited(disk_to_dev(nbd->disk), > > "Request send failed, requeueing\n"); > > This hunk doesn't feel ideal: The assumption in the normal code path > here is that the socket is dead, i.e. the error isn't recoverable. With > this way to handle it, nbd_pending_cmd_work() will keep retrying until > the request finally times out. We could probably return an error right > away. > > In fact, I think even requeuing (and ideally still completing the > request successfully in the end) would be fine in this case because > we'll shut down the socket and never send any additional data on it, so > the server will never see a complete command. We would just have to make > sure that nbd_pending_cmd_work() doesn't try to complete sending the > command any more. > > But even though this error path isn't optimal, I feel it might be > acceptable. Let's see if someone else has an opinion on it. > > Tested-by: Kevin Wolf <kwolf@xxxxxxxxxx> > Reviewed-by: Kevin Wolf <kwolf@xxxxxxxxxx> Hello Jens, Can you make this fix into v6.14 if you are fine? Thanks, Ming