Re: Default QinQ behaviour for MTU and REORDER flag

Alex Zeffertt <ajz@xxxxxxxxxxxxxxxxxxxxxx> · Wed, 10 Oct 2007 10:21:13 +0100

Peter Stuge wrote:
On Tue, Oct 09, 2007 at 02:23:41PM +0100, Alex Zeffertt wrote:
Some ethernet drivers implement a change_mtu() method, and use the
MTU size passed in (from ifconfig <dev> mtu <size>) to set the
maximum rx frame size.

MTU isn't used for rx. T for transmit.

It is true that MTU *should* just specify the maximum *transmit* size.
However, ethernet drivers have to enter *something* into the hardware's
max rx frame register.  Some use a hard coded value.  Others make the
assumption that "my MTU is the same as other hosts' MTUs" and use
dev->mtu + 18.  If they're vlan aware they may use dev->mtu + 22, but
they might do anything!

I've seen, for example, that the e1000 implementation of dev->change_mtu()
sets dev->mtu and uses this to tell the hardware what the limits should be
for both tx and rx frame sizes.

If a driver is not vlan aware it may set a fixed maximum rx frame
size of 1518, or - if it implements change_mtu() - a maximum rx
frame size of mtu + 18.

If a driver *is* vlan aware it will add 4 to the above, as it knows
that the frames may have an extra 4 byte shim that gets stripped by
the vlan layer before the frame enters the IP stack.

For a driver to be q-in-q aware it will need to add 4*(max q-in-q
levels).

The driver usually doesn't have too much to say in this. The hardware
typically only supports a few maximum packet sizes and >1518 requires
varying efforts from one chip to another.

I agree.  If the driver fix required to support vlans is to increase the max
frame size to the next quantised value, and that value is 2048 bytes, then
for this driver, if it supports vlans at all it will support q-in-q (up to
about a hundred levels).

So some drivers won't need fixing up for q-in-q, but others will.

It seems to me that what we really need is a way of setting the max_rx_frame and
max_tx_frame independently of setting the mtu.  The mtu is mostly of significance
to the IP stack, but the interface may not be directly plugged into an IP stack.
It may be plugged it into a vlan device, and then a bridge.  Or the user may not
be interested in IP at all and just be using a packet socket.

Perhaps what we need is to get rid of

     dev->mtu
     dev->change_mtu()

and replace with

     dev->max_rx_frame
     dev->max_tx_frame
     dev->change_max_rx_frame()
     dev->change_max_tx_frame()

This way you could independently set the mtu (which will be handled by the IP stack)
and the maximum rx and tx frame sizes.

Let me add some details...:

If I brought up a new ethernet device "eth0" and ran "ifconfig eth0 mtu 1500" then
the IP stack would set the eth0's mtu to 1500, and tell the driver that
its max_rx_frame/max_tx_frame should be at least 1500 bytes (ethernet payload).

If I then created vlan device eth0.10 and did "ifconfig eth0.10 mtu 1500"
the IP stack would set eth0.10's mtu to 1500, and tell eth0.10's driver that its
its max_rx_frame/max_tx_frame should be at least 1500 bytes.  This in turn would
tell eth0's driver that its max_rx_frame/max_tx_frame should be at least 1504 bytes.
However, eth0's mtu would remain at 1500.

Q-in-Q would be trivial.  If I created eth0.10.11 and ran "ifconfig eth0.10.11 mtu 1500",
this would tell eth0.10.11's driver to set its max_rx_frame/max_tx_frame to at least 1500.
This in turn would tell eth0.10's driver to set its max_rx_frame/max_tx_frame to at least
1504.  This in turn would tell eth0's driver to set its max_rx_frame/max_tx_frame to at least
1508.  But all three interfaces would still have mtus of 1500, because the mtus are handled
by the stack, not the drivers.

This system would end the need to hack ethernet drivers every time someone decides to
insert another layer.

Comments?

Alex

_______________________________________________
Vlan mailing list
Vlan@xxxxxxxxxxxxxxx
http://www.candelatech.com/mailman/listinfo/vlan