Re: Possible regression with buffered writes + NOWAIT behavior, under memory pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 10, 2025 at 03:12:24PM -0300, Raphael S. Carvalho wrote:
> While running scylladb test suite, which uses io_uring + buffered
> writes + XFS, the system was spuriously returning ENOMEM, despite
> there being plenty of available memory to be reclaimed from the page
> cache. FWIW, I am running: 6.12.9-100.fc40.x86_64
.....

> In the patch ''mm: return an ERR_PTR from __filemap_get_folio', I see
> the following changes:
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -468,19 +468,12 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
>  struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
>  {
>         unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
> -       struct folio *folio;
> 
>         if (iter->flags & IOMAP_NOWAIT)
>                 fgp |= FGP_NOWAIT;
> 
> -       folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
> +       return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
>                         fgp, mapping_gfp_mask(iter->inode->i_mapping));
> -       if (folio)
> -               return folio;
> -
> -       if (iter->flags & IOMAP_NOWAIT)
> -               return ERR_PTR(-EAGAIN);
> -       return ERR_PTR(-ENOMEM);
>  }
> 
> This leads to me believe we have a regression in this area, after that
> patch, since iomap_get_folio() is no longer returning EAGAIN with
> IOMAP_NOWAIT, if __filemap_get_folio() failed to get a folio. Now it
> returns ENOMEM unconditionally.

Yes, I think you are right - FGP_NOWAIT error returns are not
handled correctly by __filemap_get_folio().

> Since we pushed the error picking decision to __filemap_get_folio, I
> think it makes sense for us to patch it such that it returns EAGAIN if
> allocation failed (under pressure) because IOMAP_NOWAIT was requested
> by its caller and allocation is not allowed to block waiting for
> reclaimer to do its thing.
> 
> A possible way to fix it is this one-liner, but I am not well versed
> in this area, so someone may end up suggesting a better fix:
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 804d7365680c..9e698a619545 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1964,7 +1964,7 @@ struct folio *__filemap_get_folio(struct
> address_space *mapping, pgoff_t index,
>                 do {
>                         gfp_t alloc_gfp = gfp;
> 
> -                       err = -ENOMEM;
> +                       err = (fgp_flags & FGP_NOWAIT) ? -ENOMEM : -EAGAIN;
>                         if (order > min_order)
>                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
>                         folio = filemap_alloc_folio(alloc_gfp, order);

Better to only do the FGP_NOWAIT check when a failure occurs; that
puts it in the slow path rather than having to evaluate it
unnecessarily every time through the function/loop. i.e.

 		folio = filemap_alloc_folio(gfp, order);
-		if (!folio)
-			return ERR_PTR(-ENOMEM);
+		if (!folio) {
+			if (fgp_flags & FGP_NOWAIT)
+				err = -EAGAIN;
+			else
+				err = -ENOMEM;
+			continue;
+		}

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux