Re: [PATCH v2] nvme: provide fallback for discard alloc failure

Keith Busch <keith.busch@xxxxxxxxx> · Wed, 12 Dec 2018 09:50:17 -0700

On Wed, Dec 12, 2018 at 09:36:36AM -0700, Jens Axboe wrote:
> On 12/12/18 9:28 AM, Keith Busch wrote:
> > On Wed, Dec 12, 2018 at 09:18:11AM -0700, Jens Axboe wrote:
> >> When boxes are run near (or to) OOM, we have a problem with the discard
> >> page allocation in nvme. If we fail allocating the special page, we
> >> return busy, and it'll get retried. But since ordering is honored for
> >> dispatch requests, we can keep retrying this same IO and failing. Behind
> >> that IO could be requests that want to free memory, but they never get
> >> the chance.
> >>
> >> Allocate a fixed discard page per controller for a safe fallback, and use
> >> that if the initial allocation fails.
> > 
> > Do we need to allocate this per controller? One page for the whole driver
> > may be sufficient to make forward progress, right?
> 
> It should be, but that might create a shit storm if we're OOM and have
> tons of drives. I think one per controller is saner, and it's dwarfed
> by memory we consume anyway in static allocations.

Okay fair enough.

Reviewed-by: Keith Busch <keith.busch@xxxxxxxxx>