On Sun, 2008-06-01 at 09:14 -0400, Dan Williams wrote: > On Fri, 2008-05-30 at 16:49 -0400, Alan Cox wrote: > > On Fri, May 30, 2008 at 03:33:37PM -0400, Colin Walters wrote: > > > DBus is not the same as any other random software because it is explicitly > > > designed to provide reliable communication *between* components, much like > > > the kernel. If you restart it at random times that reliability guarantee is > > > destroyed. > > > > So the questions you should ask are > > - Why does restarting dbus have to be unreliable > > It's a communication pipe; restarting D-Bus itself is reliable becuase > it's just like TCP. Its the transport. But making what gets > _transported_ reliable is the kicker. > > It's exactly like all those Cingular/AT&T dropped call commercials from > a while ago: > > http://youtube.com/watch?v=DR26BZUo3Dk > http://youtube.com/watch?v=GEd3pS1jXJ4 > http://www.spike.com/video/2839248 (spoof) Except that in the case of NM, instead of being a one-to-one conversation like these, it's like NM is the foreman of a construction site, and just because his/her conversation to each one of the bulldozer, crane, and structure welders drops because the cell company had an outage at the base station his/her phone is patched through, doesn't mean that when the outage is over, that s/he can just assume that what the bulldozer, crane, and welders did in the mean time was exactly what needed to happen. S/He needs to go and verify that everything is exactly like s/he expects it, except s/he can't yell "All Stop!" but has to check and verify everything while the work keeps going. Not simple. Dan > Suddenly all the state dependent on a D-Bus service is suspect, because > you have no idea what's going on while the bus is down. You have to > re-synchronize your state after the bus comes back, and that's not a > simple task. > > > - Why isn't there a recovery mechanism > > The recovery mechanism would be in each service, because the service > knows whether or not it needs recovery or not, and would know how to > merge/synchronize it's state with the services that it depends on. Some > don't need to. But ones with state dependent on other D-Bus services > would. > > > - Why does network manager have to do the work itself not the support code > > Like above, because NM has specific state, and when D-Bus goes away, > it's communication channels with the daemons that affect that > NM-specific state are gone, and NM can't make any assumptions about > what's happening in any other daemon while the bus is gone. Maybe your > VPN just came up for rekeying, but the signal got lost because D-Bus > isn't around. So when the bus comes back, your VPN connection is > already dropped. > > Or DHCP re-bound while the bus was down, and your sysadmin changed DNS > servers on you, and the signal from dhclient got lost (because the bus > was down). Unless you re-do the entire DHCP transaction (or teach > dhclient about dbus properly so it can answer questions without having > to exec() stupid scripts that then re-emit state back over D-Bus) NM > would have no idea that the returned DHCP options had changed. And thus > your DNS is dead. > > > And more fundamentally > > > > Why the ... are people still writing software which doesn't try and tolerate > > faults that are recoverable to a useful extent. Yes dbus might have to lose > > a few messages and send everyone a "duh whoops" event so they can recover but > > "oh dear it broke everyone reboot" is not good engineering. > > In some cases, it's a cost/benefit analysis. Is the cost of writing and > maintaining a pile of code that handles a D-Bus restart, which shouldn't > ever happen, worth the benefit? In some cases, definitely. In other > cases, probably not. That isn't an excuse to write crappy software, but > it's certainly not as simple of a problem as you present it. > > Dan > > > So I'm likewise pleased the Debian people raised a sensible point. > > > > Alan > > > -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list