Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 5/30/2018 2:30 PM, Eric Dumazet wrote:
On Wed, May 30, 2018 at 5:08 PM Qing Huang<qing.huang@xxxxxxxxxx>  wrote:

On 5/30/2018 1:50 PM, Eric Dumazet wrote:
On Wed, May 30, 2018 at 4:30 PM Qing Huang<qing.huang@xxxxxxxxxx>  wrote:
On 5/29/2018 9:11 PM, Eric Dumazet wrote:
Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
brought a regression caught in our regression suite, thanks to KASAN.
If KASAN reported issue was really caused by smaller chunk sizes,
changing allocation
order dynamically will eventually hit the same issue.
Sigh, you have little idea of what your patch really did...

The KASAN part only shows the tip of the iceberg, but our main concern
is an increase of memory overhead.
Well, the commit log only mentioned KASAN and but the change here didn't
seem to solve
the issue.
Can you elaborate ?

My patch solves our problems.

Both the memory overhead and KASAN splats are gone.

If KASAN issue was triggered by using smaller chunks, when under memory pressure with lots of fragments, low order memory allocation will do the similar things. So perhaps in your test env, memory allocation and usage is relatively static, that's probably why using larger chunks didn't really utilize low order
allocation code path hence no KASAN issue was spotted.

Smaller chunk size in the mlx4 driver is not supposed to cause any memory corruption. We will probably need to continue to investigate this. Can you provide the test command that trigger this issue when running KASAN kernel so we can try to reproduce it in our lab? It could be that upstream code is missing some other
fixes.


Alternative is to revert your patch, since we are now very late in 4.17 cycle.

Memory usage has grown a lot with your patch, since each 4KB page needs a full
struct mlx4_icm_chunk (256 bytes of overhead !)
Going to smaller chunks will have some overhead. It depends on the
application though.
What's the total increased memory consumption in your env?
As I explained, your patch adds 256 bytes of overhead per 4KB.

Your changelog did not mentioned that at all, and we discovered this
the hard way.

If you have some concern regarding memory usage, you should bring this up during code review.

Repeated failure and retry for lower order allocations could be bad for latency too. This wasn't
mentioned in this commit either.

Like I said, how much overhead really depends on the application. 256 bytes x chunks may not be
significant on a server with lots of memory.

That is pretty intolerable, and is a blocker for us, memory is precious.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message tomajordomo@xxxxxxxxxxxxxxx
More majordomo info athttp://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux