Quoting Daniel Stone (2017-06-05 11:47:44) > Hi, > > On 5 June 2017 at 11:35, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It > > struggles with handling reclaim via kswapd (through inconsistency within > > throttle_direct_reclaim() and even then the race between multiple > > allocators makes the two step of reclaim then allocate fragile), and as > > our buffers are always dirty (with very few exceptions), we required > > kswapd to perform pageout on them. The only effective means of waiting > > on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That > > leaves us with the dilemma of invoking the oomkiller instead of > > propagating the allocation failure back to userspace where it can be > > handled more gracefully (one hopes). > > The i965 GL(ES) driver may dash your hopes somewhat: > > ret = execbuffer(dri_screen->fd, batch, hw_ctx, > 4 * USED_BATCH(*batch), > in_fence_fd, out_fence_fd, flags); > > if (ret != 0) { > fprintf(stderr, "intel_do_flush_locked failed: %s\n", strerror(-ret)); > exit(1); > } And their response has been that mesa is not a system critical library, so don't use it in such roles. We have also floated patches for several years now for that error to percolate back. Otoh, all the other memory handling paths have more or less been fixed to report back that failure. You'll note from Linus's message it wasn't this that failed, but one of those other allocation paths and the error handling associated with them in the client. > Before removing NORETRY, occasionally I'd get lucky and Chrome would > fail, but usually it'd be Mutter and my entire session would > disappear. I'm also not sure what a good strategy as a compositor > would be: just keep on trying updates until you get lucky? Sit doing > nothing for a while and hope redraws succeed 'later' ... ? In terms of mutter and chrome, the first question is why have they filled all of memory with buffers. I strongly suspect there are leaks all around, but if there are not something as complex as chrome can easily identify buffers it has kept merely for convenience that can be rebuilt on demand. (But they should already be using GL APPLE purgeable memory for those.) There is a wider problem than this, having seen a growth in the number of order-6 allocations failures over the past few kernels (or rather distribution updates) for simple cursor updates (where the first question to be asked is why is the cursor being reallocated?) And yes, they should be robust in handling error conditions and throw away rendering if it failed. A compositor may even preallocate resources so that it can throw a message onto a screen if a failure occurs, a lowlevel driver can do that and with Vulkan that too should be possible (at least down to a few mallocs internal to the driver, but using an externally controlled allocator). > Similarly > to Linus, I was in a position where reclaim should've been extremely > effective - into the gigabytes - at the time, so pushing reclaim > harder and taking a small time penalty seems far better than a hard > failure. The point here was that we did reclaim and fail (the failure outlined in the changelog was that reclaim is not waiting for kswapd due a false positive from allow_direct_reclaim()); after having first purged everything unwanted from the set of i915 buffers. It wasn't that we asked for no reclaim, it's just that we don't want to have the oomkiller kill something at random. And there was no middle ground in the set of gfp flags. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx