On Mon, Oct 7, 2024 at 11:43 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > There are situations where fuse servers can become unresponsive or > stuck, for example if the server is in a deadlock. Currently, there's > no good way to detect if a server is stuck and needs to be killed > manually. > > This patchset adds a timeout option where if the server does not reply to a > request by the time the timeout elapses, the connection will be aborted. > This patchset also adds two dynamically configurable fuse sysctls > "default_request_timeout" and "max_request_timeout" for controlling/enforcing > timeout behavior system-wide. > > Existing systems running fuse servers will not be affected unless they > explicitly opt into the timeout. > > v6: > https://lore.kernel.org/linux-fsdevel/20240830162649.3849586-1-joannelkoong@xxxxxxxxx/ > Changes from v6 -> v7: > - Make timer per-connection instead of per-request (Miklos) > - Make default granularity of time minutes instead of seconds > - Removed the reviewed-bys since the interface of this has changed (now > minutes, instead of seconds) > > v5: > https://lore.kernel.org/linux-fsdevel/20240826203234.4079338-1-joannelkoong@xxxxxxxxx/ > Changes from v5 -> v6: > - Gate sysctl.o behind CONFIG_SYSCTL in makefile (kernel test robot) > - Reword/clarify last sentence in cover letter (Miklos) > > v4: > https://lore.kernel.org/linux-fsdevel/20240813232241.2369855-1-joannelkoong@xxxxxxxxx/ > Changes from v4 -> v5: > - Change timeout behavior from aborting request to aborting connection > (Miklos) > - Clarify wording for sysctl documentation (Jingbo) > > v3: > https://lore.kernel.org/linux-fsdevel/20240808190110.3188039-1-joannelkoong@xxxxxxxxx/ > Changes from v3 -> v4: > - Fix wording on some comments to make it more clear > - Use simpler logic for timer (eg remove extra if checks, use mod timer API) > (Josef) > - Sanity-check should be on FR_FINISHING not FR_FINISHED (Jingbo) > - Fix comment for "processing queue", add req->fpq = NULL safeguard (Bernd) > > v2: > https://lore.kernel.org/linux-fsdevel/20240730002348.3431931-1-joannelkoong@xxxxxxxxx/ > Changes from v2 -> v3: > - Disarm / rearm timer in dev_do_read to handle race conditions (Bernrd) > - Disarm timer in error handling for fatal interrupt (Yafang) > - Clean up do_fuse_request_end (Jingbo) > - Add timer for notify retrieve requests > - Fix kernel test robot errors for #define no-op functions > > v1: > https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@xxxxxxxxx/ > Changes from v1 -> v2: > - Add timeout for background requests > - Handle resend race condition > - Add sysctls > > Joanne Koong (3): > fs_parser: add fsparam_u16 helper > fuse: add optional kernel-enforced timeout for requests > fuse: add default_request_timeout and max_request_timeout sysctls > > Documentation/admin-guide/sysctl/fs.rst | 27 +++++++++++ > fs/fs_parser.c | 14 ++++++ > fs/fuse/dev.c | 63 ++++++++++++++++++++++++- > fs/fuse/fuse_i.h | 55 +++++++++++++++++++++ > fs/fuse/inode.c | 34 +++++++++++++ > fs/fuse/sysctl.c | 20 ++++++++ > include/linux/fs_parser.h | 9 ++-- > 7 files changed, 218 insertions(+), 4 deletions(-) > > -- > 2.43.5 > These are the benchmark numbers I am seeing on my machine: --- Machine info --- Architecture: x86_64 CPU(s): 36 On-line CPU(s) list: 0-35 Model name: Intel(R) Xeon(R) D-2191A CPU @ 1.60GHz BIOS Model name: Intel(R) Xeon(R) D-2191A CPU @ 1.60GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 1 Stepping: 4 Frequency boost: disabled CPU(s) scaling MHz: 100% CPU max MHz: 1601.0000 CPU min MHz: 800.0000 --- Setting up the testing environment --- sudo mount -t tmpfs -o size=10G tmpfs ~/tmp_mount Mount libfuse server for tests: for test a) Non-passthrough writes - ./libfuse/build/example/passthrough_ll -o max_threads=4 -o source=/root/tmp_mount /root/fuse_mount for test b) Passthrough writes - ./libfuse/build/example/passthrough_hp --num-threads=4 /root/tmp_mount. /root/fuse_mount Test using fio: fio --name=seqwrite --ioengine=sync --rw=write --bs=1k --size=1G --numjobs=4 --fallocate=none --ramp_time=30 --group_reporting=1 --directory=/root/fuse_mount Enable timeouts by running 'echo 500 | sudo tee /proc/sys/fs/fuse/default_request_timeout' before mounting fuse server Disable timeouts by running 'echo 0 | sudo tee /proc/sys/fs/fuse/default_request_timeout' before mounting fuse server Discarded outliers --- Tests --- a) Non-passthrough sequential writes ./libfuse/build/example/passthrough_ll -o max_threads=4 -o source=/root/tmp_mount /root/fuse_mount --- Baseline (no timeouts) --- Ran this on origin/for-next Saw around ~273 MiB/s WRITE: bw=277MiB/s (291MB/s), 277MiB/s-277MiB/s (291MB/s-291MB/s), io=4096MiB (4295MB), run=14761-14761msec WRITE: bw=271MiB/s (285MB/s), 271MiB/s-271MiB/s (285MB/s-285MB/s), io=4096MiB (4295MB), run=15091-15091msec WRITE: bw=274MiB/s (287MB/s), 274MiB/s-274MiB/s (287MB/s-287MB/s), io=4096MiB (4295MB), run=14949-14949msec WRITE: bw=277MiB/s (290MB/s), 277MiB/s-277MiB/s (290MB/s-290MB/s), io=4096MiB (4295MB), run=14801-14801msec WRITE: bw=274MiB/s (288MB/s), 274MiB/s-274MiB/s (288MB/s-288MB/s), io=4096MiB (4295MB), run=14939-14939msec WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s), io=4096MiB (4295MB), run=15060-15060msec WRITE: bw=269MiB/s (282MB/s), 269MiB/s-269MiB/s (282MB/s-282MB/s), io=4096MiB (4295MB), run=15254-15254msec WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s), io=4096MiB (4295MB), run=15055-15055msec WRITE: bw=275MiB/s (288MB/s), 275MiB/s-275MiB/s (288MB/s-288MB/s), io=4096MiB (4295MB), run=14893-14893msec WRITE: bw=270MiB/s (283MB/s), 270MiB/s-270MiB/s (283MB/s-283MB/s), io=4096MiB (4295MB), run=15176-15176msec --- Request timeouts with periodic timer (approach from this patchset) --- Saw around ~271MiB/s WRITE: bw=265MiB/s (278MB/s), 265MiB/s-265MiB/s (278MB/s-278MB/s), io=4096MiB (4295MB), run=15454-15454msec WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s), io=4096MiB (4295MB), run=15262-15262msec WRITE: bw=271MiB/s (284MB/s), 271MiB/s-271MiB/s (284MB/s-284MB/s), io=4096MiB (4295MB), run=15113-15113msec WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s), io=4096MiB (4295MB), run=15301-15301msec WRITE: bw=274MiB/s (287MB/s), 274MiB/s-274MiB/s (287MB/s-287MB/s), io=4096MiB (4295MB), run=14965-14965msec WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s), io=4096MiB (4295MB), run=15277-15277msec WRITE: bw=276MiB/s (290MB/s), 276MiB/s-276MiB/s (290MB/s-290MB/s), io=4096MiB (4295MB), run=14828-14828msec WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s), io=4096MiB (4295MB), run=15069-15069msec WRITE: bw=273MiB/s (287MB/s), 273MiB/s-273MiB/s (287MB/s-287MB/s), io=4096MiB (4295MB), run=14987-14987msec WRITE: bw=279MiB/s (293MB/s), 279MiB/s-279MiB/s (293MB/s-293MB/s), io=4096MiB (4295MB), run=14662-14662msec WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s), io=4096MiB (4295MB), run=15071-15071msec --- Request timeouts with one timer per request (approach from v6 [1]) --- Saw around ~263MiB/s WRITE: bw=262MiB/s (275MB/s), 262MiB/s-262MiB/s (275MB/s-275MB/s), io=4096MiB (4295MB), run=15620-15620msec WRITE: bw=262MiB/s (275MB/s), 262MiB/s-262MiB/s (275MB/s-275MB/s), io=4096MiB (4295MB), run=15614-15614msec WRITE: bw=256MiB/s (269MB/s), 256MiB/s-256MiB/s (269MB/s-269MB/s), io=4096MiB (4295MB), run=15995-15995msec WRITE: bw=264MiB/s (277MB/s), 264MiB/s-264MiB/s (277MB/s-277MB/s), io=4096MiB (4295MB), run=15504-15504msec WRITE: bw=260MiB/s (273MB/s), 260MiB/s-260MiB/s (273MB/s-273MB/s), io=4096MiB (4295MB), run=15749-15749msec WRITE: bw=267MiB/s (280MB/s), 267MiB/s-267MiB/s (280MB/s-280MB/s), io=4096MiB (4295MB), run=15354-15354msec WRITE: bw=266MiB/s (279MB/s), 266MiB/s-266MiB/s (279MB/s-279MB/s), io=4096MiB (4295MB), run=15409-15409msec WRITE: bw=265MiB/s (277MB/s), 265MiB/s-265MiB/s (277MB/s-277MB/s), io=4096MiB (4295MB), run=15480-15480msec WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s), io=4096MiB (4295MB), run=15283-15283msec WRITE: bw=267MiB/s (280MB/s), 267MiB/s-267MiB/s (280MB/s-280MB/s), io=4096MiB (4295MB), run=15332-15332msec b) Passthrough sequential writes ./libfuse/build/example/passthrough_hp --num-threads=4 /root/tmp_mount. /root/fuse_mount --- Baseline (no timeouts) --- Ran this on origin/for-next Saw around ~245 MiB/s WRITE: bw=246MiB/s (258MB/s), 246MiB/s-246MiB/s (258MB/s-258MB/s), io=4096MiB (4295MB), run=16676-16676msec WRITE: bw=248MiB/s (260MB/s), 248MiB/s-248MiB/s (260MB/s-260MB/s), io=4096MiB (4295MB), run=16508-16508msec WRITE: bw=246MiB/s (258MB/s), 246MiB/s-246MiB/s (258MB/s-258MB/s), io=4096MiB (4295MB), run=16636-16636msec WRITE: bw=246MiB/s (258MB/s), 246MiB/s-246MiB/s (258MB/s-258MB/s), io=4096MiB (4295MB), run=16654-16654msec WRITE: bw=242MiB/s (253MB/s), 242MiB/s-242MiB/s (253MB/s-253MB/s), io=4096MiB (4295MB), run=16957-16957msec WRITE: bw=249MiB/s (261MB/s), 249MiB/s-249MiB/s (261MB/s-261MB/s), io=4096MiB (4295MB), run=16449-16449msec WRITE: bw=245MiB/s (257MB/s), 245MiB/s-245MiB/s (257MB/s-257MB/s), io=4096MiB (4295MB), run=16699-16699msc WRITE: bw=241MiB/s (253MB/s), 241MiB/s-241MiB/s (253MB/s-253MB/s), io=4096MiB (4295MB), run=16981-16981msec WRITE: bw=244MiB/s (256MB/s), 244MiB/s-244MiB/s (256MB/s-256MB/s), io=4096MiB (4295MB), run=16792-16792msec WRITE: bw=246MiB/s (258MB/s), 246MiB/s-246MiB/s (258MB/s-258MB/s), io=4096MiB (4295MB), run=16665-16665msec --- Request timeouts with periodic timer (approach from this patchset) --- Saw around ~237 MiB/s WRITE: bw=237MiB/s (248MB/s), 237MiB/s-237MiB/s (248MB/s-248MB/s), io=4096MiB (4295MB), run=17295-17295msec WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s), io=4096MiB (4295MB), run=17357-17357msec WRITE: bw=240MiB/s (251MB/s), 240MiB/s-240MiB/s (251MB/s-251MB/s), io=4096MiB (4295MB), run=17096-17096msec WRITE: bw=238MiB/s (249MB/s), 238MiB/s-238MiB/s (249MB/s-249MB/s), io=4096MiB (4295MB), run=17245-17245msec WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s), io=4096MiB (4295MB), run=17365-17365msec WRITE: bw=235MiB/s (246MB/s), 235MiB/s-235MiB/s (246MB/s-246MB/s), io=4096MiB (4295MB), run=17466-17466msec WRITE: bw=235MiB/s (246MB/s), 235MiB/s-235MiB/s (246MB/s-246MB/s), io=4096MiB (4295MB), run=17444-17444msec WRITE: bw=241MiB/s (253MB/s), 241MiB/s-241MiB/s (253MB/s-253MB/s), io=4096MiB (4295MB), run=17003-17003msec WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s), io=4096MiB (4295MB), run=17361-17361msec WRITE: bw=244MiB/s (256MB/s), 244MiB/s-244MiB/s (256MB/s-256MB/s), io=4096MiB (4295MB), run=16777-16777msec --- Request timeouts with one timer per request (approach from v6 [1]) --- Saw around ~232 MiB/s WRITE: bw=230MiB/s (241MB/s), 230MiB/s-230MiB/s (241MB/s-241MB/s), io=4096MiB (4295MB), run=17816-17816msec WRITE: bw=233MiB/s (244MB/s), 233MiB/s-233MiB/s (244MB/s-244MB/s), io=4096MiB (4295MB), run=17613-17613msec WRITE: bw=231MiB/s (242MB/s), 231MiB/s-231MiB/s (242MB/s-242MB/s), io=4096MiB (4295MB), run=17716-17716msec WRITE: bw=231MiB/s (242MB/s), 231MiB/s-231MiB/s (242MB/s-242MB/s), io=4096MiB (4295MB), run=17728-17728msec WRITE: bw=233MiB/s (244MB/s), 233MiB/s-233MiB/s (244MB/s-244MB/s), io=4096MiB (4295MB), run=17578-17578msec WRITE: bw=232MiB/s (243MB/s), 232MiB/s-232MiB/s (243MB/s-243MB/s), io=4096MiB (4295MB), run=17676-17676msec WRITE: bw=231MiB/s (242MB/s), 231MiB/s-231MiB/s (242MB/s-242MB/s), io=4096MiB (4295MB), run=17761-17761msec WRITE: bw=234MiB/s (245MB/s), 234MiB/s-234MiB/s (245MB/s-245MB/s), io=4096MiB (4295MB), run=17529-17529msec WRITE: bw=230MiB/s (241MB/s), 230MiB/s-230MiB/s (241MB/s-241MB/s), io=4096MiB (4295MB), run=17823-17823msec WRITE: bw=235MiB/s (247MB/s), 235MiB/s-235MiB/s (247MB/s-247MB/s), io=4096MiB (4295MB), run=17393-17393msec Overall - request timeouts with a periodic timer performs better than the approach in v6 of attaching one timer to each request. - I didn't see a significant difference in performance with enabling timers when running non-passthrough fuse server, but did see about a 3% drop on passthrough servers Thanks, Joanne [1] https://lore.kernel.org/linux-fsdevel/20240830162649.3849586-1-joannelkoong@xxxxxxxxx/