Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 14/02/2017 7:29 PM, Tom Herbert wrote:
On Tue, Feb 14, 2017 at 7:51 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
On Tue, 2017-02-14 at 16:56 +0200, Tariq Toukan wrote:

As the previous series caused hangs, we must run functional regression
tests over this series as well.
Run has already started, and results will be available tomorrow morning.

In general, I really like this series. The re-factorization looks more
elegant and more correct, functionally.

However, performance wise: we fear that the numbers will be drastically
lower with this transition to order-0 pages,
because of the (becoming critical) page allocator and dma operations
bottlenecks, especially on systems with costly
dma operations, such as ARM, iommu=on, etc...

So, again, performance after this patch series his higher,
once you have sensible RX queues parameters, for the expected workload.

Only in pathological cases, you might have some regression.

The old schem was _maybe_ better _when_ memory is not fragmented.

When you run hosts for months, memory _is_ fragmented.

You never see that on benchmarks, unless you force memory being
fragmented.



We already have this exact issue in mlx5, where we moved to order-0
allocations with a fixed size cache, but that was not enough.
Customers of mlx5 have already complained about the performance
degradation, and currently this is hurting our business.
We get a clear nack from our performance regression team regarding doing
the same in mlx4.
So, the question is, can we live with this degradation until those
bottleneck challenges are addressed?
Again, there is no degradation.

We have been using order-0 pages for years at Google.

Only when we made the mistake to rebase from the upstream driver and
order-3 pages we got horrible regressions, causing production outages.

I was silly to believe that mm layer got better.

Following our perf experts feedback, I cannot just simply Ack. We need
to have a clear plan to close the perf gap or reduce the impact.
Your perf experts need to talk to me, or any experts at Google and
Facebook, really.

I agree with this 100%! To be blunt, power users like this are testing
your drivers far beyond what Mellanox is doing and understand how
performance gains in benchmarks translate to possible gains in real
production way more than your perf experts can. Listen to Eric!

Tom


Anything _relying_ on order-3 pages being available to impress
friends/customers is a lie.

Isn't it the same principle in page_frag_alloc() ?
It is called form __netdev_alloc_skb()/__napi_alloc_skb().

Why is it ok to have order-3 pages (PAGE_FRAG_CACHE_MAX_ORDER) there?
By using netdev/napi_alloc_skb, you'll get that the SKB's linear data is a frag of a huge page,
and it is not going to be freed before the other non-linear frags.
Cannot this cause the same threats (memory pinning and so...)?

Currently, mlx4 doesn't use this generic API, while most other drivers do.

Similar claims are true for TX:
https://github.com/torvalds/linux/commit/5640f7685831e088fe6c2e1f863a6805962f8e81



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux