Re: [RFC net-next v3 06/10] net: bridge: mrp: switchdev: Extend switchdev API to offload MRP

Jürgen Lambrecht <j.lambrecht@xxxxxxxxxxx> · Mon, 27 Jan 2020 15:39:05 +0100

On 1/27/20 1:27 PM, Allan W. Nielsen wrote:
> CAUTION: This Email originated from outside Televic. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Hi Jürgen,
>
> On 27.01.2020 12:29, Jürgen Lambrecht wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> On 1/26/20 4:59 PM, Andrew Lunn wrote:
>>> Given the design of the protocol, if the hardware decides the OS etc
>>> is dead, it should stop sending MRP_TEST frames and unblock the ports.
>>> If then becomes a 'dumb switch', and for a short time there will be a
>>> broadcast storm. Hopefully one of the other nodes will then take over
>>> the role and block a port.
This can probably be a configuration option in the hardware, how to fall-back.
>
>> In my experience a closed loop should never happen. It can make
>> software crash and give other problems.  An other node should first
>> take over before unblocking the ring ports. (If this is possible - I
>> only follow this discussion halfly)
>>
>> What is your opinion?
> Having loops in the network is never a good thing - but to be honest, I
> think it is more important that we ensure the design can survive and
> recover from loops.
Indeed
>
> With the current design, it will be really hard to void loops when the
> network boot. MRP will actually start with the ports blocked, but they
> will be unblocked in the period from when the bridge is created and
> until MRP is enabled. If we want to change this (which I'm not too keen
> on), then we need to be able to block the ports while the bridge is
> down.
Our ring network is part of a bigger network. Loops are really not allowed.
>
> And even if we do this, then we can not guarantee to avoid loops. Lets
> assume we have a small ring with just 2 nodes: a MRM and a MRC. Lets
> assume the MRM boots first. It will unblock both ports as the ring is
> open. Now the MRC boots, and make the ring closed, and create a loop.
> This will take some time (milliseconds) before the MRM notice this and
> block one of the ports.

In my view there is a bring-up and tear-down module needed. I don't know if it should be part of MRP or not? Probably not, so something on top of the mrp daemon.

>
> But while we are at this topic, we need to add some functionality to
> the user-space application such that it can set the priority of the MRP
> frames. We will get that fixed.

Indeed! In my old design I had to give high priority, else the loop was wrongly closed at high network load.

I guess you mean the priority in the VLAN header?
I think to remember one talked about the bride code being VLAN-agnostic.

>
>> (FYI: I made that mistake once doing a proof-of-concept ring design:
>> during testing, when a "broken" Ethernet cable was "fixed" I had for a
>> short time a loop, and then it happened often that that port of the
>> (Marvell 88E6063) switch was blocked.  (To unblock, only solution was
>> to bring that port down and up again, and then all "lost" packets came
>> out in a burst.) That problem was caused by flow control (with pause
>> frames), and disabling flow control fixed it, but flow-control is
>> default on as far as I know.)
> I see. It could be fun to see if what we have proposed so far will with
> with such a switch.

Depending on the projects I could work on it later this year (or only next year or not..)

Kind regards,

Jürgen

>
> /Allan
>