Andreas Hindborg <andreas.hindborg@xxxxxxx> writes: [...] >>> > >>> > oops, I miss the single queue depth point per zone, so ublk won't break >>> > zoned write at all, and I agree order of batch IOs is one problem, but >>> > not hard to solve. >>> >>> The current implementation _does_ break zoned write because it reverses >>> batched writes. But if it is an easy fix, that is cool :) >> >> Please look at Damien's comment: >> >>>> That lock is already there and using it, mq-deadline will never dispatch >>>> more than one write per zone at any time. This is to avoid write >>>> reordering. So multi queue or not, for any zone, there is no possibility >>>> of having writes reordered. >> >> For zoned write, mq-deadline is used to limit at most one inflight write >> for each zone. >> >> So can you explain a bit how the current implementation breaks zoned >> write? > > Like Damien wrote in another email, mq-deadline will only impose > ordering for requests submitted in batch. The flow we have is the > following: > > - Userspace sends requests to ublk gendisk > - Requests go through block layer and is _not_ reordered when using > mq-deadline. They may be split. > - Requests hit ublk_drv and ublk_drv will reverse order of _all_ > batched up requests (including split requests). > - ublk_drv sends request to ublksrv in _reverse_ order. > - ublksrv sends requests _not_ batched up to target device. > - Requests that enter mq-deadline at the same time are reordered in LBA > order, that is all good. > - Requests that enter the kernel in different batches are not reordered > in LBA order and end up missing the write pointer. This is bad. > > So, ublk_drv is not functional for zoned storage as is. Either we have > to fix up the ordering in userspace in ublksrv, and that _will_ have a > performance impact. Or we fix the bug in ublk_drv that causes batched > requests to be _reversed_. Here is a suggestion for a fix. It needs work, but it illustrates the idea. >From 48f54a2a83daf19dda3c928e6518ce4a3e443fcd Mon Sep 17 00:00:00 2001 From: Andreas Hindborg <andreas.hindborg@xxxxxxx> Date: Fri, 18 Nov 2022 13:44:45 +0100 Subject: [PATCH] wip: Do not reorder requests in ublk --- drivers/block/ublk_drv.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 6a4a94b4cdf4..4fb5ccd01202 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -9,6 +9,7 @@ * * (part of code stolen from loop.c) */ +#include <linux/llist.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/sched.h> @@ -55,6 +56,7 @@ #define UBLK_PARAM_TYPE_ALL (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD) struct ublk_rq_data { + struct llist_node llnode; struct callback_head work; }; @@ -121,6 +123,7 @@ struct ublk_queue { unsigned int max_io_sz; bool abort_work_pending; unsigned short nr_io_ready; /* how many ios setup */ + struct llist_head pdu_queue; struct ublk_device *dev; struct ublk_io ios[0]; }; @@ -724,8 +727,15 @@ static void ublk_rq_task_work_fn(struct callback_head *work) struct ublk_rq_data *data = container_of(work, struct ublk_rq_data, work); struct request *req = blk_mq_rq_from_pdu(data); + struct ublk_queue *ubq = req->mq_hctx->driver_data; - __ublk_rq_task_work(req); + /* Some times this list is empty, but that is OK */ + struct llist_node *head = llist_del_all(&ubq->pdu_queue); + head = llist_reverse_order(head); + llist_for_each_entry(data, head, llnode) { + req = blk_mq_rq_from_pdu(data); + __ublk_rq_task_work(req); + } } static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx, @@ -753,6 +763,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx, enum task_work_notify_mode notify_mode = bd->last ? TWA_SIGNAL_NO_IPI : TWA_NONE; + llist_add(&data->llnode, &ubq->pdu_queue); if (task_work_add(ubq->ubq_daemon, &data->work, notify_mode)) goto fail; } else { @@ -1170,6 +1181,9 @@ static int ublk_init_queue(struct ublk_device *ub, int q_id) ubq->io_cmd_buf = ptr; ubq->dev = ub; + + init_llist_head(&ubq->pdu_queue); + return 0; } -- 2.38.1