Re: [PATCH 06/13] fuse: Add an interval ring stop worker/monitor

Bernd Schubert <bschubert@xxxxxxx> · Thu, 23 Mar 2023 20:51:53 +0000

On 3/23/23 14:18, Bernd Schubert wrote:
> On 3/23/23 13:35, Miklos Szeredi wrote:
>> On Thu, 23 Mar 2023 at 12:04, Bernd Schubert <bschubert@xxxxxxx> wrote:
>>>
>>> Thanks for looking at these patches!
>>>
>>> I'm adding in Ming Lei, as I had taken several ideas from ublkm I guess
>>> I also should also explain in the commit messages and code why it is
>>> done that way.
>>>
>>> On 3/23/23 11:27, Miklos Szeredi wrote:
>>>> On Tue, 21 Mar 2023 at 02:11, Bernd Schubert <bschubert@xxxxxxx> wrote:
>>>>>
>>>>> This adds a delayed work queue that runs in intervals
>>>>> to check and to stop the ring if needed. Fuse connection
>>>>> abort now waits for this worker to complete.
>>>>
>>>> This seems like a hack.   Can you explain what the problem is?
>>>>
>>>> The first thing I notice is that you store a reference to the task
>>>> that initiated the ring creation.  This already looks fishy, as the
>>>> ring could well survive the task (thread) that created it, no?
>>>
>>> You mean the currently ongoing work, where the daemon can be restarted?
>>> Daemon restart will need some work with ring communication, I will take
>>> care of that once we have agreed on an approach. [Also added in 
>>> Alexsandre].
>>>
>>> fuse_uring_stop_mon() checks if the daemon process is exiting and and
>>> looks at fc->ring.daemon->flags & PF_EXITING - this is what the process
>>> reference is for.
>>
>> Okay, so you are saying that the lifetime of the ring is bound to the
>> lifetime of the thread that created it?
>>
>> Why is that?
>>
>> I'ts much more common to bind a lifetime of an object to that of an
>> open file.  io_uring_setup() will do that for example.
>>
>> It's much easier to hook into the destruction of an open file, than
>> into the destruction of a process (as you've observed). And the way
>> you do it is even more confusing as the ring is destroyed not when the
>> process is destroyed, but when a specific thread is destroyed, making
>> this a thread specific behavior that is probably best avoided.
>>
>> So the obvious solution would be to destroy the ring(s) in
>> fuse_dev_release().  Why wouldn't that work?
>>
> 
> I _think_ I had tried it at the beginning and run into issues and then 
> switched the ublk approach. Going to try again now.
> 

Found the reason why I complete SQEs when the daemon stops - on daemon 
side I have

ret = io_uring_wait_cqe(&queue->ring, &cqe);

and that hangs when you stop user side with SIGTERM/SIGINT. Maybe that 
could be solved with io_uring_wait_cqe_timeout() / 
io_uring_wait_cqe_timeout(), but would that really be a good solution? 
We would now have CPU activity in intervals on the daemon side for now 
good reason - the more often the faster SIGTERM/SIGINT works.
So at best, it should be uring side that stops to wait on a receiving a 
signal.

Thanks,
Bernd