On Tue, Jan 07 2025, Bernd Schubert wrote: > On 1/7/25 15:42, Luis Henriques wrote: >> Hi, >> On Tue, Jan 07 2025, Bernd Schubert wrote: >> >>> This adds support for fuse request completion through ring SQEs >>> (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing >>> the ring entry it becomes available for new fuse requests. >>> Handling of requests through the ring (SQE/CQE handling) >>> is complete now. >>> >>> Fuse request data are copied through the mmaped ring buffer, >>> there is no support for any zero copy yet. >> Please find below a few more comments. > > Thanks, I fixed all comments, except of retry in fuse_uring_next_fuse_req. Awesome, thanks for taking those comments into account. > [...] > >> Also, please note that I'm trying to understand this patchset (and the >> whole fuse-over-io-uring thing), so most of my comments are minor nits. >> And those that are not may simply be wrong! I'm just noting them as I >> navigate through the code. >> (And by the way, thanks for this work!) >> >>> +/* >>> + * Get the next fuse req and send it >>> + */ >>> +static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent, >>> + struct fuse_ring_queue *queue, >>> + unsigned int issue_flags) >>> +{ >>> + int err; >>> + bool has_next; >>> + >>> +retry: >>> + spin_lock(&queue->lock); >>> + fuse_uring_ent_avail(ring_ent, queue); >>> + has_next = fuse_uring_ent_assign_req(ring_ent); >>> + spin_unlock(&queue->lock); >>> + >>> + if (has_next) { >>> + err = fuse_uring_send_next_to_ring(ring_ent, issue_flags); >>> + if (err) >>> + goto retry; >> I wonder whether this is safe. Maybe this is *obviously* safe, but I'm >> still trying to understand this patchset; so, for me, it is not :-) >> Would it be worth it trying to limit the maximum number of retries? > > No, we cannot limit retries. Let's do a simple example with one ring > entry and also just one queue. Multiple applications create fuse > requests. The first application fills the only available ring entry > and submits it, the others just get queued in queue->fuse_req_queue. > After that the application just waits request_wait_answer() > > On commit of the first request the ring task has to take the next > request from queue->fuse_req_queue - if something fails with that > request it has to complete it and proceed to the next request. > If we would introduce a max-retries here, it would put the ring entry > on hold (FRRS_AVAILABLE) and until another application comes, it would > forever wait there. The applications waiting in request_wait_answer > would never complete either. Oh! OK, I see it now. I totally misunderstood it then. Thanks for taking your taking explaining it. Cheers, -- Luís >>> + } >>> +} >>> + >>> +static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent) >>> +{ >>> + struct fuse_ring_queue *queue = ent->queue; >>> + >>> + lockdep_assert_held(&queue->lock); >>> + >>> + if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE)) >>> + return -EIO; >>> + >>> + ent->state = FRRS_COMMIT; >>> + list_move(&ent->list, &queue->ent_commit_queue); >>> + >>> + return 0; >>> +} >>> + >>> +/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */ >>> +static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags, >>> + struct fuse_conn *fc) >>> +{ >>> + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe); >>> + struct fuse_ring_ent *ring_ent; >>> + int err; >>> + struct fuse_ring *ring = fc->ring; >>> + struct fuse_ring_queue *queue; >>> + uint64_t commit_id = READ_ONCE(cmd_req->commit_id); >>> + unsigned int qid = READ_ONCE(cmd_req->qid); >>> + struct fuse_pqueue *fpq; >>> + struct fuse_req *req; >>> + >>> + err = -ENOTCONN; >>> + if (!ring) >>> + return err; >>> + >>> + if (qid >= ring->nr_queues) >>> + return -EINVAL; >>> + >>> + queue = ring->queues[qid]; >>> + if (!queue) >>> + return err; >>> + fpq = &queue->fpq; >>> + >>> + spin_lock(&queue->lock); >>> + /* Find a request based on the unique ID of the fuse request >>> + * This should get revised, as it needs a hash calculation and list >>> + * search. And full struct fuse_pqueue is needed (memory overhead). >>> + * As well as the link from req to ring_ent. >>> + */ >>> + req = fuse_request_find(fpq, commit_id); >>> + err = -ENOENT; >>> + if (!req) { >>> + pr_info("qid=%d commit_id %llu not found\n", queue->qid, >>> + commit_id); >>> + spin_unlock(&queue->lock); >>> + return err; >>> + } >>> + list_del_init(&req->list); >>> + ring_ent = req->ring_entry; >>> + req->ring_entry = NULL; >>> + >>> + err = fuse_ring_ent_set_commit(ring_ent); >>> + if (err != 0) { >> I'm probably missing something, but because we removed 'req' from the list >> above, aren't we leaking it if we get an error here? > > Hmm, yeah, that is debatable. We basically have a grave error here. > Either kernel or userspace are doing something wrong. Though probably > you are right and we should end the request with EIO. > > > Thanks, > Bernd > > >