On 9/13/24 8:00 AM, Joanne Koong wrote: > On Thu, Aug 22, 2024 at 8:34 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: >> >> On 6/4/24 6:02 PM, Miklos Szeredi wrote: >>> On Tue, 4 Jun 2024 at 11:32, Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote: >>> >>>> Back to the background for the copy, so it copies pages to avoid >>>> blocking on memory reclaim. With that allocation it in fact increases >>>> memory pressure even more. Isn't the right solution to mark those pages >>>> as not reclaimable and to avoid blocking on it? Which is what the tmp >>>> pages do, just not in beautiful way. >>> >>> Copying to the tmp page is the same as marking the pages as >>> non-reclaimable and non-syncable. >>> >>> Conceptually it would be nice to only copy when there's something >>> actually waiting for writeback on the page. >>> >>> Note: normally the WRITE request would be copied to userspace along >>> with the contents of the pages very soon after starting writeback. >>> After this the contents of the page no longer matter, and we can just >>> clear writeback without doing the copy. >> >> OK this really deviates from my previous understanding of the deadlock >> issue. Previously I thought *after* the server has received the WRITE >> request, i.e. has copied the request and page content to userspace, the >> server needs to allocate some memory to handle the WRITE request, e.g. >> make the data persistent on disk, or send the data to the remote >> storage. It is the memory allocation at this point that actually >> triggers a memory direct reclaim (on the FUSE dirty page) and causes a >> deadlock. It seems that I misunderstand it. > > I think your previous understanding is correct (or if not, then my > understanding of this is incorrect too lol). > The first write request makes it to userspace and when the server is > in the middle of handling it, a memory reclaim is triggered where > pages need to be written back. This leads to a SECOND write request > (eg writing back the pages that are reclaimed) but this second write > request will never be copied out to userspace because the server is > stuck handling the first write request and waiting for the page > reclaim bits of the reclaimed pages to be unset, but those reclaim > bits can only be unset when the pages have been copied out to > userspace, which only happens when the server reads /dev/fuse for the > next request. Right, that's true. > >> >> If that's true, we can clear PF_writeback as long as the whole request >> along with the page content has already been copied to userspace, and >> thus eliminate the tmp page copying. >> > > I think the problem is that on a single-threaded server, the pages > will not be copied out to userspace for the second request (aka > writing back the dirty reclaimed pages) since the server is stuck on > the first request. Agreed. -- Thanks, Jingbo