Re: Potential regression/bug in net/mlx5 driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13 Apr 20:26, Jakub Kicinski wrote:
On Thu, 13 Apr 2023 20:03:18 -0700 Saeed Mahameed wrote:
On 13 Apr 15:51, Jakub Kicinski wrote:
>On Thu, 13 Apr 2023 15:34:21 -0700 Saeed Mahameed wrote:
>> But this management connection function has the same architecture as other
>> "Normal" mlx5 functions, from the driver pov. The same way mlx5
>> doesn't care if the underlaying function is CX4/5/6 we don't care if it was
>> a "management function".
>
>Yes, and that's why every single IPU implementation thinks that it's
>a great idea. Because it's easy to implement. But what is it for
>architecturally? Running what is effectively FW commands over TCP?

Where did you get this idea from? maybe we got the name wrong,
"management PF" is simply a minimalistic netdev PF to have eth connection
with the on board BMC ..

I agree that the name "management PF" sounds scary, but it is not a control
function as you think, not at all. As the original commit message states:
"loopback PF designed for communication with BMC".

Can you draw a small diagram with the bare metal guest, IPU, and BMC?
What's talking to what? And what packets are exchanged?


Yes, Working on that...

>> But let's discuss what's wrong with it, and what are your thoughts ?
>> the fact that it breaks a 6 years OLD FW, doesn't make it so horrible.
>
>Right, the breakage is a separate topic.
>
>You say 6 years old but the part is EOL, right? The part is old and
>stable, AFAIU the breakage stems from development work for parts which
>are 3 or so generations newer.

Officially we test only 3 GA FWs back. The fact that mlx5 is a generic CX
driver makes it really hard to test all the possible combinations, so we
need to be strict with how back we want to officially support and test old
generations.

Would you be able to pull the datapoints for what 3 GA FWs means
in case of CX4? Release number and date when it was released?


https://network.nvidia.com/files/related-docs/eol/LCR-000821.pdf

Since CX4 was EOL last year, it is going to be hard to find this info but
let me check my email archive..
12.28.2006   27-Sep-20 - recommended version
12.26.xxxx   12-Dec-2019
12.24.1000   2-Dec-18


I understand the challenge of backward compat with a multi-gen
driver. It's a trade off.

>The question is who's supposed to be paying the price of mlx5 being
>used for old and new parts? What is fair to expect from the user
>when the FW Paul has presumably works just fine for him?
>
Upgrade FW when possible, it is always easier than upgrading the kernel.
Anyways this was a very rare FW/Arch bug, We should've exposed an
explicit cap for this new type of PF when we had the chance, now it's too
late since a proper fix will require FW and Driver upgrades and breaking
the current solution we have over other OSes as well.

Yes I can craft an if condition to explicitly check for chip id and FW
version for this corner case, which has no precedence in mlx5, but I prefer
to ask to upgrade FW first, and if that's an acceptable solution, I would
like to keep the mlx5 clean and device agnostic as much as possible.

IMO you either need a fully fleshed out FW update story, with advanced
warnings for a few releases, distributing the FW via linux-firmware or
fwupdmgr or such.  Or deal with the corner cases in the driver :(


Completely agree, I will start an internal discussion ..
We can get Paul to update, sure, but if he noticed so quickly the
question remains how many people out in the wild will get affected
and not know what the cause is?

Right, I will make sure this will be addressed, will let you know how we
will handle this, will try to post a patch early next cycle, but i will
need to work with Arch and release managers for this, so it will take a
couple of weeks to formalize a proper solution.




[Index of Archives]     [Selinux Refpolicy]     [Linux SGX]     [Fedora Users]     [Fedora Desktop]     [Yosemite Photos]     [Yosemite Camping]     [Yosemite Campsites]     [KDE Users]     [Gnome Users]

  Powered by Linux