Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver

Omer Shpigelman <oshpigelman@xxxxxxxxx> · Wed, 19 Jun 2024 07:16:20 +0000

On 6/18/24 17:19, Andrew Lunn wrote:
>>>> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
>>>> +{
>>>> +     struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
>>>> +     struct net_device *ndev = port->ndev;
>>>> +     u32 mtu;
>>>> +
>>>> +     if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>>>> +             netdev_err(ndev, "port is in reset, can't get MTU\n");
>>>> +             return 0;
>>>> +     }
>>>> +
>>>> +     mtu = ndev->mtu;
>>>
>>> I think you need a better error message. All this does is access
>>> ndev->mtu. What does it matter if the port is in reset? You don't
>>> access it.
>>>
>>
>> This function is called from the CN driver to get the current MTU in order
>> to configure it to the HW, for exmaple when configuring an IB QP. The MTU
>> value might be changed by user while we execute this function.
> 
> Change of MTU will happen while holding RTNL. Why not simply hold RTNL
> while programming the hardware? That is the normal pattern for MAC
> drivers.
>

I can hold the RTNL lock while configuring the HW but it seems like a big
overhead. Configuring the HW might take some time due to QP draining or
cache invalidation.
To me it seems unnecessary but if that's the common way then I'll change
it.

>>>> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
>>>> +{
>>>> +     struct hbl_en_port *port = hbl_netdev_priv(netdev);
>>>> +     int rc = 0;
>>>> +
>>>> +     if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>>>> +             netdev_err(netdev, "port is in reset, can't change MTU\n");
>>>> +             return -EBUSY;
>>>> +     }
>>>> +
>>>> +     if (netif_running(port->ndev)) {
>>>> +             hbl_en_port_close(port);
>>>> +
>>>> +             /* Sleep in order to let obsolete events to be dropped before re-opening the port */
>>>> +             msleep(20);
>>>> +
>>>> +             netdev->mtu = new_mtu;
>>>> +
>>>> +             rc = hbl_en_port_open(port);
>>>> +             if (rc)
>>>> +                     netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
>>>
>>> Does that mean the port is FUBAR?
>>>
>>> Most operations like this are expected to roll back to the previous
>>> working configuration on failure. So if changing the MTU requires new
>>> buffers in your ring, you should first allocate the new buffers, then
>>> free the old buffers, so that if allocation fails, you still have
>>> buffers, and the device can continue operating.
>>>
>>
>> A failure in opening a port is a fatal error. It shouldn't happen. This is
>> not something we wish to recover from.
> 
> What could cause open to fail? Is memory allocated?
> 

Memory is allocated but it is freed in case of a failure.
Port opening can fail due to other reasons as well like some HW timeout
while configuring the ETH QP.

>> This kind of an error indicates a severe system error that will usually
>> require a driver removal and reload anyway.
>>
>>>> +module_param(poll_enable, bool, 0444);
>>>> +MODULE_PARM_DESC(poll_enable,
>>>> +              "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
>>>
>>> Module parameters are not liked. This probably needs to go away.
>>>
>>
>> I see that various vendors under net/ethernet/* use module parameters.
>> Can't we add another one?
> 
> Look at the history of those module parameters. Do you see many added
> in the last year? 5 years?
> 

I didn't check that prior to my submit. Regarding this "no new module
parameters allowed" rule, is that documented anywhere? if not, is that the
common practice? not to try to do something that was not done recently?
how "recently" is defined?
I just want to clarify this because it's hard to handle these submissions
when we write some code based on existing examples but then we are
rejected because "we don't do that here anymore".
I want to avoid future cases of this mismatch.

>>>> +static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo)
>>>> +{
>>>> +     modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
>>>> +     modinfo->type = ETH_MODULE_SFF_8636;
>>>
>>> Is this an SFF, not an SFP? How else can you know what module it is
>>> without doing an I2C transfer to ask the module what it is?
>>>
>>
>> The current type is SFF and it is unlikely to be changed.
> 
> Well, SFF are soldered to the board, so yes, it is unlikely to
> change...
> 
> Please add a comment that this is an SFF, not an SFP, so is soldered
> to the board, and so it is known to be an 8636 compatible device.
> 

I'll add.

>> Are you referring to get_module_eeprom_by_page()? if so, then it is not
>> supported by our FW, we read the entire data on device load.
>> However, I can hide that behind the new API and return only the
>> requested page if that's the intention.
> 
> Well, if your firmware is so limited, then you might as well stick to
> the old API, and let the core do the conversion to the legacy
> code. But i'm surprised you don't allow access to the temperature
> sensors, received signal strength, voltages etc, which could be
> exported via HWMON.
> 

I'll stick to the old API.
Regaring the sensors, our compute driver (under accel/habanalabs) exports
them via HWMON.

>>>> +                     ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
>>>
>>> That looks odd. Care to explain?
>>>
>>
>> The HW of all of our ports supports autoneg.
>> But in addition, the ports are divided to two groups:
>> internal: ports which are connected to other Gaudi2 ports in the same server.
>> external: ports which are connected to an external switch.
>> Only internal ports use autoneg.
>> The ports mask which sets each port as internal/external is fetched from
>> the FW on device load.
> 
> That is not what i meant. lc_advertising should indicate the link
> modes the peer is advertising. If this was a copper link, it typically
> would contain 10BaseT-Half, 10BaseT-Full, 100BaseT-Half,
> 100BaseT-Full, 1000BaseT-Half. Setting the Autoneg bit is pointless,
> since the peer must be advertising in order that lp_advertising has a
> value!
> 

Sorry, but I don't get this. The problem is the setting of the Autoneg bit
in lp_advertising? is that redundant? I see that other vendors set it too
in case that Autoneg was completed.

>> Our HW supports Pause frames.
>> But, PFC can be disabled via lldptool for exmaple, so we won't advertise
>> it.
> 
> Please also implement the standard netdev way of configuring pause.
> When you do that, you should start to understand how pause can be
> negotiated, or forced. That is what most get wrong.
> 

Let me revisit this.

>> I'll try to find more info about it, but can you please share what's wrong
>> with the curent code?
>> BTW I will change it to Asym_Pause as we support Tx pause frames as well.
>>
>>>> +     if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) {
>>>> +             netdev_err(port->ndev, "port autoneg is disabled by BMC\n");
>>>> +             rc = -EFAULT;
>>>> +             goto out;
>>>
>>> Don't say you support autoneg in supported if that is the case.
>>>
>>> And EFAULT is about memory problems. EINVAL, maybe EPERM? or
>>> EOPNOTSUPP.
>>>
>>>         Andrew
>>
>> Yeah, should be switched to EPERM/EOPNOTSUPP.
>> Regarding the support of autoneg - the HW supports autoneg but it might be
>> disabled by the FW. Hence we might not be able to switch it on.
> 
> No problem, ask the firmware what it is doing, and return the reality
> in ksetting. Only say you support autoneg if your firmware allows you
> to perform autoneg.
> 
>    Andrew
> 

Ok, I'll set the Autoneg support bit properly.