On Mon, 3 Jun 2024 at 16:43, Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote: > > > > On 6/3/24 08:17, Jingbo Xu wrote: > > Hi, Miklos, > > > > We spotted a performance bottleneck for FUSE writeback in which the > > writeback kworker has consumed nearly 100% CPU, among which 40% CPU is > > used for copy_page(). > > > > fuse_writepages_fill > > alloc tmp_page > > copy_highpage > > > > This is because of FUSE writeback design (see commit 3be5a52b30aa > > ("fuse: support writable mmap")), which newly allocates a temp page for > > each dirty page to be written back, copy content of dirty page to temp > > page, and then write back the temp page instead. This special design is > > intentional to avoid potential deadlocked due to buggy or even malicious > > fuse user daemon. > > I also noticed that and I admin that I don't understand it yet. The commit says > > <quote> > The basic problem is that there can be no guarantee about the time in which > the userspace filesystem will complete a write. It may be buggy or even > malicious, and fail to complete WRITE requests. We don't want unrelated parts > of the system to grind to a halt in such cases. > </quote> > > > Timing - NFS/cifs/etc have the same issue? Even a local file system has no guarantees > how fast storage is? I don't have the details but it boils down to the fact that the allocation context provided by GFP_NOFS (PF_MEMALLOC_NOFS) cannot be used by the unprivileged userspace server (and even if it could, there's no guarantee, that it would). When this mechanism was introduced, the deadlock was a real possibility. I'm not sure that it can still happen, but proving that it cannot might be difficult. Thanks, Miklos