Hi all, Recently, there was a long discussion upstream [1] on a patchset that removes temp pages when handling writeback in FUSE. Temp pages are the main bottleneck for write performance in FUSE and local benchmarks showed approximately a 20% and 45% improvement in throughput for 4K and 1M block size writes respectively when temp pages were removed. More information on how FUSE uses temp pages can be found here [2]. In the discussion, there were concerns from mm regarding the possibility of untrusted malicious or buggy fuse servers never completing writeback, which would impede migration for those pages. It would be great to continue this discussion at LSF/MM and align on a solution that removes FUSE temp pages altogether while satisfying mm’s expectations for page migration. These are the most promising options so far: a) Kill untrusted fuse servers that do not reply to writeback requests by a certain amount of time (where that time can be configurable through a sysctl) as a safeguard for system resources b) Use unmovable pages for untrusted fuse servers If there are no acceptable solutions, it might also be worth considering whether there could be mm options that could sufficiently mitigate this problem. One potential idea is co-locating FUSE folio allocations to the same page block so that the worst-case malicious/buggy server scenario only hampers migration of one page block. If there is no way to remove temp pages altogether, then it would be useful to discuss: a) how skipping temp pages should be gated: i) unprivileged servers default to always using temp pages while privileged servers skip temp pages ii) splice defaults to using temp pages and writeback for non-temp pages get canceled if migration is initiated iii) skip temp pages if a sufficient enough request timeout is set b) how to support large FUSE folios for writeback. Currently FUSE uses an rb tree to track writeback state of temp pages but with large folios, this gets unsustainable if concurrent writebacks happen on the same page indices but are part of different sized folios, eg the following scenario i) writeback on a large folio is issued ii) the folio is copied to a tmp folio and writeback is cleared, we add this writeback request to the rb tree iii) the folio in the pagecache is evicted iv) another write occurs on a larger range that encompasses the range in the writeback in i) or on a subset of it It seems likely that we will need to align on another data structure instead of the rb tree to sufficiently handle this. Thanks, Joanne [1] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-5-joannelkoong@xxxxxxxxx/ [2] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@xxxxxxxxx/