Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code

Sagi Grimberg <sagi@xxxxxxxxxxx> · Thu, 8 Jun 2017 15:29:18 +0300

My interpretation is that mlx5 tried to do this for the (rather esoteric
in my mind) case where the platform does not have enough vectors for the
driver to allocate percpu. In this case, the next best thing is to stay
as close to the device affinity as possible.

No, we did it for the reason that mlx5e netdevice assumes that
IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
are always bound to the numa close to the device. and the mlx5e driver
choose those IRQs to spread
the RSS hash only into them and never uses other IRQs/Cores

OK, that explains a lot of weirdness I've seen with mlx5e.

Can you explain why you're using only a single numa node for your RSS
table? What does it buy you? You open RX rings for _all_ cpus but
only spread on part of them? I must be missing something here...

Adding Tariq,

this is also part of the weirdness :), we do that to make sure any OOB
test you run you always get the best performance
and we will guarantee to always use close numa cores.

Well I wish I knew that before :( I got to a point where I started
to seriously doubt the math truth of xor/toeplitz hashing strength :)

I'm sure you ran plenty of performance tests, but from my experience,
application locality makes much more difference than device locality,
especially when the application needs to touch the data...

we open RX rings on all of the cores in case if the user want to
change the RSS table to point to the whole thing on the fly "ethtool
-X"

That is very counter intuitive afaict, is it documented anywhere?

users might rely on the (absolutely reasonable) assumption that if a
NIC exposes X rx rings, rx hashing should spread across all of them and
not a subset.

But we are willing to change that, Tariq can provide the patch,
without changing this mlx5e is broken.

What patch? to modify the RSS spread? What is exactly broken?

So I'm not sure how to move forward here, should we modify the
indirection table construction to not rely on the unique affinity
mappings?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html