Hi Johannes,
On 8/30/19 3:53 AM, Johannes Berg wrote:
On Wed, 2019-08-28 at 16:11 -0500, Denis Kenzior wrote:
Currently frame registrations are not purged, even when changing the
interface type. This can lead to potentially weird / dangerous
situations where frames possibly not relevant to a given interface
type remain registered and mgmt_frame_register is not called for the
no-longer-relevant frame types.
I'd argue really just "weird and non-working", hardly dangerous. Even in
the mac80211 design where we want to not let you intercept e.g. AUTH
frames in client mode - if you did, then you'd just end up with a non-
working interface. Not sure I see any "dangerous situation". Not really
an all that important distinction though.
Fair enough, I'm happy to drop / reword this language. It seemed fishy
to me since the unregistration operation was not called at all, and the
driver does go to some lengths to set up the valid frame registration
types.
Depending on the design, it may also just be that those registrations
are *ignored*, because e.g. firmware intercepts the AUTH frame already,
which would just (maybe) confuse userspace - but that seems unlikely
since it switched interface type and has no real need for those frames
then.
There might be corner cases where userspace gets confused and doesn't
update the frame registrations properly. For example, wpa_s/hostap does
not listen to SET_INTERFACE events that I can tell. So if some external
app sets the mode (particularly on a 'live' interface) then all kinds of
unexpected things might happen. This is one of the motivations for
restricting certain NL80211 commands to interface SOCKET_OWNER.
So really this patch is intended more as a hot-fix / backport to stable
to make sure the older kernels can deal with some of these situations.
The kernel currently relies on userspace apps to actually purge the
registrations themselves, e.g. by closing the nl80211 socket associated
with those frames. However, this requires multiple nl80211 sockets to
be open by the userspace app, and for userspace to be aware of all state
changes. This is not something that the kernel should rely on.
I tend to agree with that the kernel shouldn't rely on it.
This commit adds a call to cfg80211_mlme_purge_registrations() to
forcefully remove any registrations left over prior to switching the
iftype.
However, I do wonder if we should make this more transactional, and hang
on to them if the type switching fails. We're not notifying userspace
that the registrations have disappeared, so if type switching fails and
it continues to work with the old type rather than throwing its hands up
and quitting or something, it'd make a possibly bigger mess to just
silently have removed them already.
I do like that idea, not sure how to go about implementing it though?
The failure case is a bit hard to deal with. Something like
NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE would help, particularly if
nl80211/cfg80211 actually checked it prior to doing anything (e.g.
disconnecting, etc). That would then take care of the majority of the
'typical' failure paths. I didn't add such checking in the other patch
set since I felt you might find it overly intrusive on userspace. But
maybe we really should do this?
So playing devil's advocate, another argument might be that by the time
we got here, we've already tore down a bunch of state. E.g.
disconnected the station, stopped AP, etc. So we've already
side-effected state in a bunch of ways, what's one more?
I *think* it should be safe to just move this after the switching
succeeds, since the switching can pretty much only be done at a point
where nothing is happening on the interface anyway, though that might
confuse the driver when the remove happens.
I would concur as that is what happens today. But should it?
Also, perhaps it'd be better to actually hang on to those registrations
that *are* still possible afterwards? But to not confuse the driver I
guess that might require unregister/re-register to happen, all of which
requires hanging on to the list and going through it after the type
switch completed?
Yes, I had those exact thoughts as well.
It isn't currently clear to me if there are any guarantees on the driver
operation call sequence that cfg80211 provides. E.g. can the driver
expect rdev_change_virtual_intf to be called only once all the old
registrations are purged and the new registrations are performed after
the fact? Or should it expect things to just happen in any order?
What do you think?
A big part of me thinks that just wiping the slate clean and having
userspace set it up from scratch isn't that much to ask and it might
want to do that anyway. It might (a big maybe?) also make the driver's
life easier if it can rely on certain guarantees from cfg80211. E.g.
that all invalid registrations are purged.
I have seen wpa_s perform a bunch of register commands which bounce off
with an -EALREADY. So it may already be erring on the side of caution
and assuming that it needs to reset the state fully? Not sure.
But if the kernel wants to be nice and spends some cycles figuring out
which frame registrations to keep and re-register them, that is also
fine with me.
Regards,
-Denis