Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 9/8/11 12:11 PM, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:

> On Thu, Sep 08, 2011 at 09:19:32AM -0700, Roopa Prabhu wrote:
>>>>> There are more features we'll want down the road though,
>>>>> so let's see whether the interface will be able to
>>>>> satisfy them in a backwards compatible way before we
>>>>> set it in stone. Here's what I came up with:
>>>>> 
>>>>> How will the filtering table be partitioned within guests?
>>>> 
>>>> Since this patch supports macvlan PASSTHRU mode only, in which the lower
>>>> device has 1-1 mapping to the guest nic, it does not require any
>>>> partitioning of filtering table within guests. Unless I missed
>>>> understanding
>>>> something. 
>>>> If the lower device were being shared by multiple guest network interfaces
>>>> (non PASSTHRU mode), only then we will need to maintain separate filter
>>>> tables for each guest network interface in macvlan and forward the pkt to
>>>> respective guest interface after a filter lookup. This could affect
>>>> performance too I think.
>>> 
>>> Not with hardware filtering support. Which is where we'd need to
>>> partition the host nic mac table between guests.
>>> 
>> I need to understand this more. In non passthru case when a VF or physical
>> nic is shared between guests,
> 
> For example, consider a VF given to each guest. Hardware supports a fixed
> total number of filters, which can be partitioned between VFs.
> 
O ok. But hw maintains VF filters separately for every VF as far as I know.
Filters received on a VF are programmed for that VF only. Am assuming all
hardware do this. Atleast our hardware does this.
What I was referring to was a single VF shared between guests using macvtap
(could be bridge mode for example). All guests sharing the VF will register
 filters with the VF via macvlan. Hw makes sure what ever the VF asked for
is received at the VF. VF in hw does not know that it is shared by guests.
Only at macvlan we might need to re-filter the pkts received on the VF and
steer pkts to the individual guests based on what they asked for.


>> the nic does not really know about the guests,
>> so I was thinking we do the same thing as we do for the passthru case (ie
>> send all the address filters from macvlan to the physical nic). So at the
>> hardware, filtering is done for all guests sharing the nic. But if we want
>> each virtio-net nic or guest to get exactly what it asked for
>> macvlan/macvtap needs to maintain a copy of each guest filter and do a
>> lookup and send only the requested traffic to the guest. Here is the
>> performance hit that I was seeing. Please see my next comment for further
>> details. 
> 
> It won't be any slower than attaching a non-passthrough macvlan
> to a device, will it?
> 
Am not sure. The filter lookup in macvlan is the one I am concerned about.
Will need to try it out.

>> 
>>>> I chose to support PASSTHRU Mode only at first because its simpler and all
>>>> code additions are in control path only.
>>> 
>>> I agree. It would be a bit silly to have a dedicated interface
>>> for passthough and a completely separate one for
>>> non passthrough.
>>> 
>> Agree. The reason I did not focus on non-passthru case in the initial
>> version was because I was thinking things to do in the non-passthru case
>> will be just add-ons to the passthru case. But true Better to flush out the
>> non-pasthru case details.
>> 
>> After dwelling on this a bit more how about the below:
>> 
>> Phase 1: Goal: Enable hardware filtering for all macvlan modes
>>     - In macvlan passthru mode the single guest virtio-nic connected will
>>       receive traffic that he requested for
>>     - In macvlan non-passthru mode all guest virtio-nics sharing the
>>       physical nic will see all other guest traffic
>>       but the filtering at guest virtio-nic
> 
> I don't think guests currently filter anything.
> 
I was referring to Qemu-kvm virtio-net in
virtion_net_receive->receive_filter. I think It only passes pkts that the
guest OS is interested. It uses the filter table that I am passing to
macvtap in this patch.


>>       will make sure each guest
>>       eventually sees traffic he asked for. This is still better than
>>       putting the physical nic in promiscuous mode.
>> 
>> (This is mainly what my patch does...but will need to remove the passthru
>> check and see if there are any thing else needed for non-passthru case)
> 
> I'm fine with sticking with passthrough, make non passthrough
> a separate phase.
> 
Ok.

>> 
>> Phase 2: Goal: Enable filtering at macvlan so that each guest virtio-nic
>> receives only what he requested for.
>>     - In this case, in addition to pushing the filters down to the physical
>>       nic we will have to maintain the same filter in macvlan and do a filter
>>       lookup before forwarding the traffic to a virtio-nic.
>> 
>> But I am thinking phase 2 might be redundant given virtio-nic already does
>> filtering for the guest.
> 
> It does? Do you mean the filter that qemu does in userspace?
> 
Yes I meant the filter that qemu does in userspace qemu-kvm/hw/virtio-net.c
receive_filter(). 

>> In which case we might not need phase 2 at all. I
>> might have been over complicating things.
>> 
>> Please comment. And please correct if I missed something.
>>  
>>  
>>>>> 
>>>>> A way to limit what the guest can do would also be useful.
>>>>> How can this be done? selinux?
>>>> 
>>>> I vaguely remember a thread on the same context.. had a suggestion to
>>>> maintain pre-approved address lists and allow guest filter registration of
>>>> only those addresses for security. This seemed reasonable. Plus the ability
>>>> to support additional address registration from guest could be made
>>>> configurable (One of your ideas again from prior work).
>>>> 
>>>> I am not an selinux expert, but I am thinking we can use it to only allow
>>>> or
>>>> disallow access or operations to the macvtap device. (?). I will check more
>>>> on this.
>>> 
>>> We'd have to have a way to revoke that as well.
>>> 
>> Yes true.
>> 
>> 
>>>>> 
>>>>> Any thoughts on spoofing filtering?
>>>> 
>>>> I can only think of checking addresses against an allowed address list.
>>>> Don't know of any other ways. Any hints ?
>>> 
>>> Hardware (esp SRIOV) often has ways to do this check, too.
>>> 
>> Yes correct. Hw sriov and even switch in 802.1Qbh has anti-spoofing feature.
>> In which case I am thinking having It at the macvtap layer is not an
>> absolute must (?).
> 
> Exactly. But let's figure out *how* it will be programmed.
> If anti-spoofing is programmed with netlink, maybe that's
> a better interface for rx filter too, for consistency.
> 
I will check sriov vfs. But I know our hw does not have an interface exposed
from the driver. We do it through the management s/w at the switch port
(this is 802.1Qbh). I will check other sriov nics. I don't think intel has a
netlink interface either. But I will double check.


>>>> 
>>>> In any case I am assuming all the protection/security measures should be
>>>> taken at the layer calling the TUNSETTXFILTER ie..In macvtap virtualization
>>>> use case its libvirt or qemu-kvm. No ?
>>> 
>>> Ideally we'd have a way to separate these capabilities, so that libvirt
>>> can override qemu.
>>> 
>>>>> 
>>>>> Would it be possible to make the filtering programmable
>>>>> using netlink, e.g. ethtool, ip, or some such?
>>>> 
>>>> Should be possible via ethtool or ip calling ioctl TUNSETTXFILTER. Are you
>>>> thinking of macvlan having a netlink interface to set filter and not ioctl
>>>> ?. Sure.
>>> 
>>> Yes.
>>> 
>>>> But I was thinking the point of implementing TUNSETTXFILTER was to
>>>> maintain compatibility with the generic tap interface that does the same
>>>> thing. 
>>> 
>>> Yes. OTOH I don't think anyone uses that ATM so it might not
>>> be important if it's not a good fit.
>>> E.g. we could notify libvirt and have it use netlink for us
>>> if we like that better.
>>> 
>> Ok thanks for clarifying that. One more reason to use TUNSETTXFILTER
>> interface was for qemu-kvm who uses the same tap interface for macvtap and
>> regular tap. So if we use netlink we have to do different things for macvtap
>> and tap filters in qemu. And qemu-kvm does not distinguish between macvtap
>> and tap as far as I know. No ?
> 
> It's not a question of simplifying qemu as much as trying to
> make the kernel interface abstract device differences
> away from users. Using same interface for tun and macvtap
> gave us some confidence that the interface is a good one.
> 
> But this does not seem to have worked with TUNSETTXFILTER -
> at least qemu doesn't use it yet, and it's been upstream
> a while. So there's no proof it's a good interface.

All qemu-kvm patches (not upstream yet) that have been floating around and
which I have been testing with use TUNSETTXFILTER. And tun kernel driver
does support filtering using TUNSETTXFILTER. It has code that does filtering
based on filter received using TUNSETTXFILTER.


But true we don't have to stick with TUNSETTXFILTER. I will explore the
netlink interface and research pros and cons and sketch the interface and
get back. 

> 
> So if we decide netlink is a better interface we can add it for tun too.
> We need to be backwards compatible and figure out what happens if someone
> tries to use both methods: probably apply both or ignore TUNSETTXFILTER ...
> 
>> 
>> Thanks you for your review and comments.
>> 
>> 
>>>> And having both the netlink op and ioctl interface might not be clean ?.
>>> 
>>> No idea.
>>> 
>>>> Sorry if I misunderstood your question.
>>>> 
>>>>> That would make this useful for bridged setups besides
>>>>> macvtap/virtualization.
>>>>> 
>>>> 
>>>> Thanks for the comments.
> 
> Overall good progress, and don't let the interface discussions
> block you. You want to push in two directions - stabilize code in one
> branch, and play with interfaces in another one. By the time there's a
> concensus on the interfaces you have the main logic all ready,
> then you merge. 

ok thanks michael! 

- Roopa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux