On Sun, Apr 9, 2023 at 4:48 AM Linux regression tracking (Thorsten Leemhuis) <regressions@xxxxxxxxxxxxx> wrote: > On 30.03.23 03:27, Paul Moore wrote: > > On Wed, Mar 29, 2023 at 6:20 PM Saeed Mahameed <saeed@xxxxxxxxxx> wrote: > >> On 28 Mar 19:08, Paul Moore wrote: > >>> > >>> Starting with the v6.3-rcX kernel releases I noticed that my > >>> InfiniBand devices were no longer present under /sys/class/infiniband, > >>> causing some of my automated testing to fail. It took me a while to > >>> find the time to bisect the issue, but I eventually identified the > >>> problematic commit: > >>> > >>> commit fe998a3c77b9f989a30a2a01fb00d3729a6d53a4 > >>> Author: Shay Drory <shayd@xxxxxxxxxx> > >>> Date: Wed Jun 29 11:38:21 2022 +0300 > >>> > >>> net/mlx5: Enable management PF initialization > >>> > >>> Enable initialization of DPU Management PF, which is a new loopback PF > >>> designed for communication with BMC. > >>> For now Management PF doesn't support nor require most upper layer > >>> protocols so avoid them. > >>> > >>> Signed-off-by: Shay Drory <shayd@xxxxxxxxxx> > >>> Reviewed-by: Eran Ben Elisha <eranbe@xxxxxxxxxx> > >>> Reviewed-by: Moshe Shemesh <moshe@xxxxxxxxxx> > >>> Signed-off-by: Saeed Mahameed <saeedm@xxxxxxxxxx> > >>> > >>> I'm not a mlx5 driver expert so I can't really offer much in the way > >>> of a fix, but as a quick test I did remove the > >>> 'mlx5_core_is_management_pf(...)' calls in mlx5/core/dev.c and > >>> everything seemed to work okay on my test system (or rather the tests > >>> ran without problem). > >>> > >>> If you need any additional information, or would like me to test a > >>> patch, please let me know. > >> > >> Our team is looking into this, the current theory is that you have an old > >> FW that doesn't have the correct capabilities set. > > > > That's very possible; I installed this card many years ago and haven't > > updated the FW once. > > > > I'm happy to update the FW (do you have a > > pointer/how-to?), but it might be good to identify a fix first as I'm > > guessing there will be others like me ... > > Nothing happened here for about ten days afaics (or was there progress > and I just missed it?). That made me wonder: how sound is Paul's guess > that there will be others that might run into this? If that's likely it > afaics would be good to get this regression fixed before the release, > which is just two or three weeks away. > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > #regzbot poke I haven't seen any updates from the mlx5 driver folks, although I may not have been CC'd? I did revert that commit on my automated testing kernels and things are working correctly again, although I'm pretty sure that's not a good long term solution. I did also dig up the information on updating the card's firmware, but I'm holding off on that in case the driver devs want me to test a fix. -- paul-moore.com