On Tue, Oct 8, 2024 at 9:26 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > On Mon, Oct 7, 2024 at 1:02 PM Bernd Schubert > <bernd.schubert@xxxxxxxxxxx> wrote: > > > > On 10/7/24 20:39, Joanne Koong wrote: > > > On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > >> > > >> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert > > >> <bernd.schubert@xxxxxxxxxxx> wrote: > > >>> > > >>> Hi Joanne, > > >>> > > >>> On 9/27/24 21:36, Joanne Koong wrote: > > >>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > >>>>> > > >>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > >>>>>> > > >>>>>> There are situations where fuse servers can become unresponsive or > > >>>>>> stuck, for example if the server is in a deadlock. Currently, there's > > >>>>>> no good way to detect if a server is stuck and needs to be killed > > >>>>>> manually. > > >>>>>> > > >>>>>> This commit adds an option for enforcing a timeout (in seconds) on > > >>>>>> requests where if the timeout elapses without a reply from the server, > > >>>>>> the connection will be automatically aborted. > > >>>>> > > >>>>> Okay. > > >>>>> > > >>>>> I'm not sure what the overhead (scheduling and memory) of timers, but > > >>>>> starting one for each request seems excessive. > > >>>> > > >>>> I ran some benchmarks on this using the passthrough_ll server and saw > > >>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s): > > >>>> fio --name randwrite --ioengine=sync --thread --invalidate=1 > > >>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > > >>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > > >>>> --group_reporting=1 --directory=/root/fuse_mount > > >>>> > > >>>> Instead of attaching a timer to each request, I think we can instead > > >>>> do the following: > > >>>> * add a "start_time" field to each request tracking (in jiffies) when > > >>>> the request was started > > >>>> * add a new list to the connection that all requests get enqueued > > >>>> onto. When the request is completed, it gets dequeued from this list > > >>>> * have a timer for the connection that fires off every 10 seconds or > > >>>> so. When this timer is fired, it checks if "jiffies > req->start_time > > >>>> + fc->req_timeout" against the head of the list to check if the > > >>>> timeout has expired and we need to abort the request. We only need to > > >>>> check against the head of the list because we know every other request > > >>>> after this was started later in time. I think we could even just use > > >>>> the fc->lock for this instead of needing a separate lock. In the worst > > >>>> case, this grants a 10 second upper bound on the timeout a user > > >>>> requests (eg if the user requests 2 minutes, in the worst case the > > >>>> timeout would trigger at 2 minutes and 10 seconds). > > >>>> > > >>>> Also, now that we're aborting the connection entirely on a timeout > > >>>> instead of just aborting the request, maybe it makes sense to change > > >>>> the timeout granularity to minutes instead of seconds. I'm envisioning > > >>>> that this timeout mechanism will mostly be used as a safeguard against > > >>>> malicious or buggy servers with a high timeout configured (eg 10 > > >>>> minutes), and minutes seems like a nicer interface for users than them > > >>>> having to convert that to seconds. > > >>>> > > >>>> Let me know if I've missed anything with this approach but if not, > > >>>> then I'll submit v7 with this change. > > >>> > > >>> > > >>> sounds great to me. Just, could we do this per fuse_dev to avoid a > > >>> single lock for all cores? > > >>> > > >> > > >> Will do! thanks for the suggestion - in that case, I'll add its own > > >> spinlock for it too then. > > > > > > I realized while working on v7 that we can't do this per fuse device > > > because the request is only associated with a device once it's read in > > > by the server (eg fuse_dev_do_read). > > > > > > I ran some rough preliminary benchmarks with > > > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 > > > -o source=~/fstests ~/fuse_mount > > > and > > > fio --name randwrite --ioengine=sync --thread --invalidate=1 > > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > > > --group_reporting=1 --directory=fuse_mount > > > > > > and didn't see any noticeable difference in throughput (~37 MiB/sec on > > > my system) with vs without the timeout. > > > > > > > > > That is not much, isn't your limit the backend? I wonder what would happen > > with 25GB/s I'm testing with. Wouldn't it make sense for this to test with > > sequential large IO? And possibly with the passthrough-hp branch that > > bypasses IO? And a NUMA system probably would be helpful as well. > > I.e. to test the effect on the kernel side without having an IO limited > > system? > > > > The preliminary benchmarks yesterday were run on a VM because I had > trouble getting consistent results between baseline runs (on origin > w/out my changes) on my usual test machine. I'm going to get this > sorted out and run some tests again. > I'll attach the updated benchmark numbers on this thread (v7 of the timeout patchset) so that everything's in one place: https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-1-joannelkoong@xxxxxxxxx/T/#t > What are you testing on that's giving you 25 GB/s? > > > > > With the io-uring interface requests stay in queues from the in-coming CPU > > so easier to achieve there, although I wonder for your use-case if it > > wouldn't be sufficient to start the timer only when the request is on > > the way to fuse-server? One disadvantage I see is that virtiofs would need > > to be specially handled. > > Unfortunately I don't think it suffices to only start the timer when > the request is on the way to the fuse server. If there's a malicious > or deadlocked server, they might not read from /dev/fuse, but we would > want to abort the connection in those cases as well. > > > Thanks, > Joanne > > > > > > > Thanks, > > Bernd > > > >