Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver

Omer Shpigelman <oshpigelman@xxxxxxxxx> · Wed, 19 Jun 2024 12:07:42 +0000

On 6/15/24 03:10, Stephen Hemminger wrote:
> [You don't often get email from stephen@xxxxxxxxxxxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Thu, 13 Jun 2024 11:22:02 +0300
> Omer Shpigelman <oshpigelman@xxxxxxxxx> wrote:
> 
>> +static int hbl_en_ports_reopen(struct hbl_aux_dev *aux_dev)
>> +{
>> +     struct hbl_en_device *hdev = aux_dev->priv;
>> +     struct hbl_en_port *port;
>> +     int rc = 0, i;
>> +
>> +     for (i = 0; i < hdev->max_num_of_ports; i++) {
>> +             if (!(hdev->ports_mask & BIT(i)))
>> +                     continue;
>> +
>> +             port = &hdev->ports[i];
>> +
>> +             /* It could be that the port was shutdown by 'ip link set down' and there is no need
>> +              * in reopening it.
>> +              * Since we mark the ports as in reset even if they are disabled, we clear the flag
>> +              * here anyway.
>> +              * See hbl_en_ports_stop_prepare() for more info.
>> +              */
>> +             if (!netif_running(port->ndev)) {
>> +                     atomic_set(&port->in_reset, 0);
>> +                     continue;
>> +             }
>> +
> 
> Rather than duplicating network device state in your own flags, it would be better to use
> existing infrastructure. Read Documentation/networking/operstates.rst
> 
> Then you could also get rid of the kludge timer stuff in hbl_en_close().
> 

I think that additional explanation is needed here.
In addition to netdev flows, we also support an internal reset flow
(that's what the in_reset boolean indicates).
Our NIC driver is an extension of the compute driver (they share the same
HW) and a reset flow might be needed due to some compute operation which
is entirely unrelated to the NIC driver. But we must not access the HW
while this reset flow is executed.
Note that this internal reset flow originates from the compute driver and
hence we might have parallel netdev operations that should be blocked
meanwhile.
The internal reset flow has 2 phases - teardown and re-init. This reopen
function is called during the re-init phase to restore the NIC ports, but
only if they were actually opened prior to the reset flow.
Regarding hbl_en_close() - during the port close we need to write to the
HW so due to the explanation above, also there we should wait for an
existing internal reset flow to finish first.
Let me know if that's clear enough and addresses your concerns.