Hi Jason, We will go ahead with just adding the "__GFP_NORETRY " flag to reduce the time it takes to fail the higher order memory allocations in case higher order pages are not available. Will send out the corresponding patch. Thank you very much for your inputs. - Praveen Kumar Kannoju. -----Original Message----- From: Jason Gunthorpe [mailto:jgg@xxxxxxxxxx] Sent: 31 March 2021 11:23 PM To: Aruna Ramakrishna <aruna.ramakrishna@xxxxxxxxxx> Cc: Praveen Kannoju <praveen.kannoju@xxxxxxxxxx>; leon@xxxxxxxxxx; dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@xxxxxxxxxx>; Rama Nichanamatlu <rama.nichanamatlu@xxxxxxxxxx>; Jeffery Yoder <jeffery.yoder@xxxxxxxxxx> Subject: Re: [PATCH v2] IB/mlx5: Reduce max order of memory allocated for xlt update On Thu, Mar 25, 2021 at 11:39:28AM -0300, Jason Gunthorpe wrote: > On Tue, Mar 23, 2021 at 09:27:38PM -0700, Aruna Ramakrishna wrote: > > > > Do you have benchmarks that show the performance of the high order > > > pages is not relavent? I'm a bit surprised to hear that > > > > > > > I guess my point was more to the effect that an order-8 alloc will > > fail more often than not, in this flow. For instance, when we were > > debugging the latency spikes here, this was the typical buddyinfo > > output on that system: > > > > Node 0, zone DMA 0 1 1 2 3 0 1 0 1 1 3 > > Node 0, zone DMA32 7 7 7 6 10 2 6 7 6 2 306 > > Node 0, zone Normal 3390 51354 17574 6556 1586 26 2 1 0 0 0 > > Node 1, zone Normal 11519 23315 23306 9738 73 2 0 1 0 0 0 > > > > I think this level of fragmentation is pretty normal on long running > > systems. Here, in the reg_mr flow, the first try (order-8) alloc > > will probably fail 9 times out of 10 (esp. after the addition of > > GFP_NORETRY flag), and then as fallback, the code tries to allocate > > a lower order, and if that too fails, it allocates a page. I think > > it makes sense to just avoid trying an order-8 alloc here. > > But a system like this won't get THPs either, so I'm not sure it is > relevant. The function was designed as it is to consume a "THP" if it > is available. So can we do this with just the addition of __GFP_NORETRY ? Jason