Re: [PATCH 1/5] shmemfs: Report ENOMEM for page allocation failures outside of tmpfs faults

Hugh Dickins <hughd@xxxxxxxxxx> · Tue, 11 Mar 2014 03:09:48 -0700 (PDT)

On Thu, 6 Mar 2014, Chris Wilson wrote:
> On Wed, Mar 05, 2014 at 07:14:49PM -0800, Hugh Dickins wrote:
> > On Wed, 5 Mar 2014, Chris Wilson wrote:
> > 
> > > The intention of returning ENOSPC for a page allocation failure due to
> > > memory exhausstion in shmem_getpage_gfp() is purely "so that a failure
> > > on a sparse tmpfs mapping will give SIGBUS not OOM." However, for other
> > > callers, for example i915.ko, we want to distinguish the error message
> > > reported to userspace between ENOSPC (meaning that we were unable to fit
> > > the object/execbuffer into the aperture) and ENOMEM (meaning that we
> > > were unable to allocate pages for the object).
> > > 
> > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > > Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> > 
> > I'm not keen on this: perhaps because it looks like a hack of yours -
> > and draws attention to what might be thought a hack of mine ;)
> > 
> > Nor am I thrilled by what was there before (we have three cases which
> > ought to be distinguished, but only ENOMEM and ENOSPC to distinguish
> > between them); but would rather it stay as is, than change what's
> > reported to the user after all these years.
> > 
> > But I do see your point, and asking you to convert ENOSPC to ENOMEM
> > in all your drivers/gpu calls might be tiresome.
> > 
> > I think we have a reasonable compromise: shmem_read_mapping_page_gfp()
> > is already the wrapper provided just for you guys, how about posting
> > a patch to map ENOSPC to ENOMEM there?
> > 
> > (You're using the MS_KERNMOUNT, so won't hit a max_blocks limit.)
> 
> Actually, I only need to the conversion along a single error path. So if
> you are happy that the only reason shmem_get_page() would report
> -ENOSPC would be for a hard memory failure, it seems permissible to
> simply fix it up in our driver:
> 
> git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f2efed9955bd..cfc4fcd90b6b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2099,7 +2099,19 @@ err_pages:
>                 page_cache_release(sg_page_iter_page(&sg_iter));
>         sg_free_table(st);
>         kfree(st);
> -       return PTR_ERR(page);
> +
> +       /* shmemfs first checks if there is enough memory to allocate the page
> +        * and reports ENOSPC should there be insufficient, along with the usual
> +        * ENOMEM for a genuine allocation failure.
> +        *
> +        * We use ENOSPC in our driver to mean that we have run out of GTT
> +        * space and so want to translate the error from shmemfs back to our
> +        * usual understanding of ENOMEM.
> +        */
> +       if (PTR_ERR(page) == -ENOSPC)
> +               return -ENOMEM;
> +       else
> +               return PTR_ERR(page);
>  }

Yes, that's fine by me: it's your territory, and shmem is not suddenly
going to use ENOSPC for something else altogether.  I assumed you were
wanting to do it once for all the GPU drivers, but if the others don't
care at present, yes, do it here for i915 - we can always move the
fixup to shmem_read_mapping_page_gfp() later on if there's a need,

>  
>  /* Ensure that the associated pages are gathered from the backing storage
> 
> Whether the other drivers (gma500, msm, omapdrm, udl, armada) care about
> the strict distinction between ENOSPC and ENOMEM, I do not know.
> 
> Would it be possible for us to use sbinfo->max_blocks to reject large
> objects in drm_gem_object_init()?

No, not without using your own separate shm mount: shmem_file_setup()
uses the internal shm_mnt (used for anon shared mmappings and SysV SHM),
and (rightly or wrongly) we have never imposed a max_blocks limit on
that mount.

But your question does suggest one possibility.  drm_gem.c is doing
the shmem_file_setup() with VM_NORESERVE flag, which gives you the
behaviour of tmpfs files (page by page accounting, perhaps ENOSPC
if !vm_enough_memory for a page) rather than SysV SHM accounting
(succeed or fail according to the initial shmem_file_setup size,
without page by page accounting thereafter).

You might want to consider whether the SHM-style accounting would
suit you better: I don't know what dictated the choice of VM_NORESERVE
originally, and I don't know if it would be a good idea to change over
or not - it would be a terrible idea if you use large sparse objects.

Hugh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx