Re: VLAN tags in mac_len

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/06/28 20:02, Johannes Berg wrote:
On Mon, 2019-06-17 at 20:15 +0900, Toshiaki Makita wrote:
I'll try to explain the problem I see, which cannot be fixed by option 1...
The bug is in tcf_vlan_act(), and mainly in skb->data, not in mac_len.

Consider about vlan packets from NIC, but non-hw-accelerated, where
vlan devices are configured to receive them.

When __netif_receive_skb_core() is called, skb is like this.

+-----+------+--------
eth | vlan | TCP/IP

+-----+------+--------
        ^
       data

skb->data is at the beginning of the vlan header.

Right.

This is reasonable because we did not process the vlan tag at this point.

I think with this simple sentence you just threw a whole new semantic
issue into the mix, one that I at least hadn't considered.

However, it's not clear to me whether we should consider a tag as
processed or not when we push it.

It's clear that we always insert a tag as unprocessed in a single tag case.
The tag is inserted as hw-offloaded one, and hw-offloaded one is treated
as an unprocessed tag in __netif_receive_skb_core().
The single tag is the most common usage I think, so unprocessed should be
the expected behavior.

Also, tc vlan act was originally introduced to replace OVS vlan action.
It's in fact used as HW offload path of ovs-vswitchd, and should behave the same as
OVS push_vlan action. And push_vlan action inserts a tag as unprocessed one.

In a sense, this means we should have two different VLAN tag push
options - considering it processed or unprocessed. Or maybe it should
always be considered unprocessed, but that's not what we do today.

Then after vlan_do_receive() (receive the skb on a vlan device), the skb is like this.

+-----+--------
eth | TCP/IP

+-----+--------
        ^
       data

Or if reorder_hdr is off (which does not remove vlan tags when receiving on vlan devices),

+-----+------+--------
eth | vlan | TCP/IP

+-----+------+--------
               ^
              data

Relying on this mechanism, we are currently able to handle multiple vlan tags.

For example if we have 2 tags,

- On __netif_receive_skb_core() invocation

+-----+------+------+--------
eth | vlan | vlan | TCP/IP

+-----+------+------+--------
        ^
       data

- After first vlan_do_receive()

+-----+------+--------
eth | vlan | TCP/IP

+-----+------+--------
        ^
       data

Or if reorder_hdr is off,

+-----+------+------+--------
eth | vlan | vlan | TCP/IP

+-----+------+------+--------
               ^
              data

When we process one tag, the data goes forward by one tag.

Right, that's a very good point.

Now looking at TC vlan case...

After it inserts two tags, the skb looks like:

(The first tag is in vlan_tci)
+-----+------+--------
eth | vlan | TCP/IP

+-----+------+--------
               ^
              data

The data pointer went forward before we process it.
This is apparently wrong. I think we don't want to (or cannot?) handle cases like this
after tcf_vlan_act(). This is why I said we should remember mac_len there.

Right, makes a lot of sense.

If you consider a tc VLAN pop, you'd argue that it should pop the next
unprocessed tag I guess, since if it was processed then it doesn't
really exist any more (semantically, you still see it if reorder_hdr is
off), right?

Right.

So, my opinion is:
On ingress, data pointer can be at the end of vlan header and mac_len probably should
include vlan tag length, but only after the vlan tag is processed.

You're basically arguing for option (3), I think, making VLAN push/pop
not manipulate mac_len since they can just push/pop *unprocessed* tags,
right?

Ah, true, on the second thought (2b) is not an appropriate fix but (3) is.
(3) more correctly emulates already tagged packets from wire so it should
cause least confusion.

I fear this will cause all kinds of trouble in other code. Perhaps we
need to make this processed/unprocessed state more explicit.

But (3) makes mac_len the same as the already tagged packets from NICs.
If other code cannot handle the packets correctly, they need to be fixed anyway.

As you explained OVS MPLS seems to rely on mac_len adjustment by skb_vlan_push(), but
it means the OVS MPLS code cannot correctly handle double-tagged packets from NICs,
so it needs a fix anyway (use __vlan_get_protocol() to get the real mac_len?).

Bridge may need to handle mac_len that is not equal to ETH_HLEN but to me it's a
different problem.

Yes. Like I just said to Daniel, I think we should make bridge handle
mac_len so that we can just exclude it from this whole discussion.
Regardless of the mac_len and processed/unprocessed tags, it would just
work as expected.

That's OK, it should help packets from vlan devices with reorder_hdr off.

Toshiaki Makita



[Index of Archives]     [Netdev]     [AoE Tools]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]     [Video 4 Linux]

  Powered by Linux