Re: Potential regression/bug in net/mlx5 driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 13 Apr 2023 20:03:18 -0700 Saeed Mahameed wrote:
> On 13 Apr 15:51, Jakub Kicinski wrote:
> >On Thu, 13 Apr 2023 15:34:21 -0700 Saeed Mahameed wrote:  
> >> But this management connection function has the same architecture as other
> >> "Normal" mlx5 functions, from the driver pov. The same way mlx5
> >> doesn't care if the underlaying function is CX4/5/6 we don't care if it was
> >> a "management function".  
> >
> >Yes, and that's why every single IPU implementation thinks that it's
> >a great idea. Because it's easy to implement. But what is it for
> >architecturally? Running what is effectively FW commands over TCP?  
> 
> Where did you get this idea from? maybe we got the name wrong, 
> "management PF" is simply a minimalistic netdev PF to have eth connection
> with the on board BMC .. 
> 
> I agree that the name "management PF" sounds scary, but it is not a control
> function as you think, not at all. As the original commit message states:
> "loopback PF designed for communication with BMC".

Can you draw a small diagram with the bare metal guest, IPU, and BMC?
What's talking to what? And what packets are exchanged?

> >> But let's discuss what's wrong with it, and what are your thoughts ?
> >> the fact that it breaks a 6 years OLD FW, doesn't make it so horrible.  
> >
> >Right, the breakage is a separate topic.
> >
> >You say 6 years old but the part is EOL, right? The part is old and
> >stable, AFAIU the breakage stems from development work for parts which
> >are 3 or so generations newer.
> 
> Officially we test only 3 GA FWs back. The fact that mlx5 is a generic CX
> driver makes it really hard to test all the possible combinations, so we
> need to be strict with how back we want to officially support and test old
> generations.

Would you be able to pull the datapoints for what 3 GA FWs means 
in case of CX4? Release number and date when it was released?

I understand the challenge of backward compat with a multi-gen
driver. It's a trade off.

> >The question is who's supposed to be paying the price of mlx5 being
> >used for old and new parts? What is fair to expect from the user
> >when the FW Paul has presumably works just fine for him?
> >  
> Upgrade FW when possible, it is always easier than upgrading the kernel.
> Anyways this was a very rare FW/Arch bug, We should've exposed an
> explicit cap for this new type of PF when we had the chance, now it's too
> late since a proper fix will require FW and Driver upgrades and breaking
> the current solution we have over other OSes as well.
>
> Yes I can craft an if condition to explicitly check for chip id and FW
> version for this corner case, which has no precedence in mlx5, but I prefer
> to ask to upgrade FW first, and if that's an acceptable solution, I would
> like to keep the mlx5 clean and device agnostic as much as possible.

IMO you either need a fully fleshed out FW update story, with advanced
warnings for a few releases, distributing the FW via linux-firmware or
fwupdmgr or such.  Or deal with the corner cases in the driver :(

We can get Paul to update, sure, but if he noticed so quickly the
question remains how many people out in the wild will get affected 
and not know what the cause is?



[Index of Archives]     [Selinux Refpolicy]     [Linux SGX]     [Fedora Users]     [Fedora Desktop]     [Yosemite Photos]     [Yosemite Camping]     [Yosemite Campsites]     [KDE Users]     [Gnome Users]

  Powered by Linux