On Mon, Jun 03, 2024 at 05:19:44PM +0200, Miklos Szeredi wrote: > On Mon, 3 Jun 2024 at 16:43, Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote: > > > > > > > > On 6/3/24 08:17, Jingbo Xu wrote: > > > Hi, Miklos, > > > > > > We spotted a performance bottleneck for FUSE writeback in which the > > > writeback kworker has consumed nearly 100% CPU, among which 40% CPU is > > > used for copy_page(). > > > > > > fuse_writepages_fill > > > alloc tmp_page > > > copy_highpage > > > > > > This is because of FUSE writeback design (see commit 3be5a52b30aa > > > ("fuse: support writable mmap")), which newly allocates a temp page for > > > each dirty page to be written back, copy content of dirty page to temp > > > page, and then write back the temp page instead. This special design is > > > intentional to avoid potential deadlocked due to buggy or even malicious > > > fuse user daemon. > > > > I also noticed that and I admin that I don't understand it yet. The commit says > > > > <quote> > > The basic problem is that there can be no guarantee about the time in which > > the userspace filesystem will complete a write. It may be buggy or even > > malicious, and fail to complete WRITE requests. We don't want unrelated parts > > of the system to grind to a halt in such cases. > > </quote> > > > > > > Timing - NFS/cifs/etc have the same issue? Even a local file system has no guarantees > > how fast storage is? > > I don't have the details but it boils down to the fact that the > allocation context provided by GFP_NOFS (PF_MEMALLOC_NOFS) cannot be > used by the unprivileged userspace server (and even if it could, > there's no guarantee, that it would). I thought we had PR_SET_IO_FLUSHER for that. Requires CAP_SYS_RESOURCES but no other privileges, then the userspace server will then always operate in PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE memory allocation context. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx