Re: Networkmanager service is shutdown too early

Dan Williams <dcbw@xxxxxxxxxx> · Sun, 01 Jun 2008 08:52:06 -0400

On Fri, 2008-05-30 at 15:38 -0400, seth vidal wrote:
> On Fri, 2008-05-30 at 15:33 -0400, Colin Walters wrote:
> > On Fri, May 30, 2008 at 3:29 PM, Simo Sorce <ssorce@xxxxxxxxxx> wrote:
> >         
> >         
> >         I am very glad the debian guys got "all cranky" on this issue,
> >         Temrinating network connections just  because a component need
> >         to
> >         resytart is plain silly.
> > 
> > DBus is not the same as any other random software because it is
> > explicitly designed to provide reliable communication *between*
> > components, much like the kernel.  If you restart it at random times
> > that reliability guarantee is destroyed.
> > 
> > http://mail.gnome.org/archives/networkmanager-list/2005-March/thread.html#00027 
> > 
> 
> Simo's point still holds though. What you've described above is a better
> reason to have not designed nm around dbus not a reason why we should be
> okay with our network services going away when we restart dbus.

The reason we don't handle messagebus restarts (yet) is that it's a
boatload of complex state to synchronize on re-init, for a case that
should never ever be happening.  It's not clear that the benefits of
doing this actually equal or surpass the cost of writing and maintaining
that code in NM and all of the other listed services for the one or two
times a year that the bus _might_ crash if a cosmic ray hits your RAM
and ECC doesn't fix it.

1) HAL - NM needs to synchronize it's device list with HAL, adding new
devices, discarding no-longer-known devices, and ensuring that the
details of existing devices (vendor, device id, driver name, etc) are
still the same.  NM has code to do this when HAL goes away but doesn't
check device properties yet.

2) wpa_supplicant - assuming wpa_supplicant handles D-Bus restarts
correctly, NM then needs to query wpa_supplicant for the interfaces the
supplicant currently controls, add new devices that may have been found
in step #1, remove devices that are no longer present from step 1, and
re-read the current interface configuration and ensure that it hasn't
changed since dbus stopped.  If it has changed, NM needs to tear the
connection down, and re-initialize the connection entirely, because the
connection isn't consistent with the pre-dbus-restart state.  And this
means the wifi connection will drop, potentially leading to another DHCP
transaction, etc.  In cases like this, assuming incorrect state is
actually _worse_ than tearing down the connection and restarting it.

3) pppd - need to check the state of the PPP connection, and pppd
doesn't provide plugins a good way of asking about internal state, since
plugins are just callouts.  If the PPP connection got dropped during the
dbus dropout, then NM needs to restart the PPP connection, because
events from pppd are obviously lost while dbus is gone.

4) vpn - need to check the state of VPN connections for the same reason
as (3).  If the VPN connection got dropped during the dbus dropout, then
NM need restart the VPN connection, because events from the vpn daemons
are obviously lost while dbus is gone.

5) DHCP - if some interface was in the middle of doing a DHCP operation
when dbus went away, NM needs to stop the ongoing DHCP transaction and
restart it, because events and options are delivered back to NM via
D-Bus.

6) System settings daemon - your system connections are provided by a
dbus service.  If the bus goes away, then NM would have to re-read all
the system connections from the system settings daemon when the bus
comes back, discard ones that no longer exist (and tear down that
connection if it's active), add new ones, and verify that existing ones
haven't changed (and if they have, restart that connection).  All to
cover the case where you touch an ifcfg file while dbus is gone.  I do
have some vague plans to merge NM and the system settings daemon back
together after 0.7 since that would make some things a lot easier (and
render this point moot) but we'll see about that.

Patches accepted.  There are some complex dependencies here that each
have their own state.  And to do it right, you need to ensure that the
entire state of the system (NM, pppd, wpa_supplicant, hald, vpnc,
openvpn, nm-system-settings, etc) is consistent with what the mothership
(NM) expects.  That's not simple, and all for a case that almost never
happens.

Dan

[1] on a different topic, having NM handle restarts of itself is another
thing; we could guarantee that _only_ statically addressed wired
ethernet devices that don't use 802.1x don't get touched over NM
restarts since they don't require an external controlling daemon like
wpa_supplicant or BlueZ or whatever.  And then, only if on restart
there's a system connection (provided by an ifcfg file on Fedora) that
matches the current settings of the device.  That's also a pile of code
(need to match multiple IP addresses/netmasks/gateways, routes, ensure
resolv.conf is correct, ensure MTU is correct, ensure MAC address for
the interface is the same as before, etc) but probably something that
should be done eventually.

-- 
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-devel-list