On 3/23/23 14:18, Bernd Schubert wrote: > On 3/23/23 13:35, Miklos Szeredi wrote: >> On Thu, 23 Mar 2023 at 12:04, Bernd Schubert <bschubert@xxxxxxx> wrote: >>> >>> Thanks for looking at these patches! >>> >>> I'm adding in Ming Lei, as I had taken several ideas from ublkm I guess >>> I also should also explain in the commit messages and code why it is >>> done that way. >>> >>> On 3/23/23 11:27, Miklos Szeredi wrote: >>>> On Tue, 21 Mar 2023 at 02:11, Bernd Schubert <bschubert@xxxxxxx> wrote: >>>>> >>>>> This adds a delayed work queue that runs in intervals >>>>> to check and to stop the ring if needed. Fuse connection >>>>> abort now waits for this worker to complete. >>>> >>>> This seems like a hack. Can you explain what the problem is? >>>> >>>> The first thing I notice is that you store a reference to the task >>>> that initiated the ring creation. This already looks fishy, as the >>>> ring could well survive the task (thread) that created it, no? >>> >>> You mean the currently ongoing work, where the daemon can be restarted? >>> Daemon restart will need some work with ring communication, I will take >>> care of that once we have agreed on an approach. [Also added in >>> Alexsandre]. >>> >>> fuse_uring_stop_mon() checks if the daemon process is exiting and and >>> looks at fc->ring.daemon->flags & PF_EXITING - this is what the process >>> reference is for. >> >> Okay, so you are saying that the lifetime of the ring is bound to the >> lifetime of the thread that created it? >> >> Why is that? >> >> I'ts much more common to bind a lifetime of an object to that of an >> open file. io_uring_setup() will do that for example. >> >> It's much easier to hook into the destruction of an open file, than >> into the destruction of a process (as you've observed). And the way >> you do it is even more confusing as the ring is destroyed not when the >> process is destroyed, but when a specific thread is destroyed, making >> this a thread specific behavior that is probably best avoided. >> >> So the obvious solution would be to destroy the ring(s) in >> fuse_dev_release(). Why wouldn't that work? >> > > I _think_ I had tried it at the beginning and run into issues and then > switched the ublk approach. Going to try again now. > Found the reason why I complete SQEs when the daemon stops - on daemon side I have ret = io_uring_wait_cqe(&queue->ring, &cqe); and that hangs when you stop user side with SIGTERM/SIGINT. Maybe that could be solved with io_uring_wait_cqe_timeout() / io_uring_wait_cqe_timeout(), but would that really be a good solution? We would now have CPU activity in intervals on the daemon side for now good reason - the more often the faster SIGTERM/SIGINT works. So at best, it should be uring side that stops to wait on a receiving a signal. Thanks, Bernd