On Fri, 2008-05-30 at 16:49 -0400, Alan Cox wrote: > On Fri, May 30, 2008 at 03:33:37PM -0400, Colin Walters wrote: > > DBus is not the same as any other random software because it is explicitly > > designed to provide reliable communication *between* components, much like > > the kernel. If you restart it at random times that reliability guarantee is > > destroyed. > > So the questions you should ask are > - Why does restarting dbus have to be unreliable It's a communication pipe; restarting D-Bus itself is reliable becuase it's just like TCP. Its the transport. But making what gets _transported_ reliable is the kicker. It's exactly like all those Cingular/AT&T dropped call commercials from a while ago: http://youtube.com/watch?v=DR26BZUo3Dk http://youtube.com/watch?v=GEd3pS1jXJ4 http://www.spike.com/video/2839248 (spoof) Suddenly all the state dependent on a D-Bus service is suspect, because you have no idea what's going on while the bus is down. You have to re-synchronize your state after the bus comes back, and that's not a simple task. > - Why isn't there a recovery mechanism The recovery mechanism would be in each service, because the service knows whether or not it needs recovery or not, and would know how to merge/synchronize it's state with the services that it depends on. Some don't need to. But ones with state dependent on other D-Bus services would. > - Why does network manager have to do the work itself not the support code Like above, because NM has specific state, and when D-Bus goes away, it's communication channels with the daemons that affect that NM-specific state are gone, and NM can't make any assumptions about what's happening in any other daemon while the bus is gone. Maybe your VPN just came up for rekeying, but the signal got lost because D-Bus isn't around. So when the bus comes back, your VPN connection is already dropped. Or DHCP re-bound while the bus was down, and your sysadmin changed DNS servers on you, and the signal from dhclient got lost (because the bus was down). Unless you re-do the entire DHCP transaction (or teach dhclient about dbus properly so it can answer questions without having to exec() stupid scripts that then re-emit state back over D-Bus) NM would have no idea that the returned DHCP options had changed. And thus your DNS is dead. > And more fundamentally > > Why the ... are people still writing software which doesn't try and tolerate > faults that are recoverable to a useful extent. Yes dbus might have to lose > a few messages and send everyone a "duh whoops" event so they can recover but > "oh dear it broke everyone reboot" is not good engineering. In some cases, it's a cost/benefit analysis. Is the cost of writing and maintaining a pile of code that handles a D-Bus restart, which shouldn't ever happen, worth the benefit? In some cases, definitely. In other cases, probably not. That isn't an excuse to write crappy software, but it's certainly not as simple of a problem as you present it. Dan > So I'm likewise pleased the Debian people raised a sensible point. > > Alan > -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list