Autostart challenges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After thinking about autostart for a bit (a week or two), my primary
concerns about it are two:

 1. what is it for

 2. what are its failure modes

1: I agreed to implementing autostart largely because Jim had trouble
configuring tabled for iwhd and I took it as evidence that there was
a space for simplification and/or automation there. However, let's
look it from the perspective of a Deltacloud hacker who just wants iwhd
for testing. Surely they would be served with the filesystem back-end?

What I missed is that running tabled is only interesting when you debug
iwhd's S3 backend, hack on iwhd's fail-over, or want to measure some
performance in a real private cloud. Unfortunately, autostart cannot
do any of that (even if it knew how to do it, it would require assumptions
about the minimal cloud, and then it becomes not very auto). And for
the Deltacloud hacker's testing a simple iwhd-1node-example.json
would be sufficient.

2: IMHO the fully automated and under-the-hood operation is only
valuable if it's completely bulletproof. Unfortunately, it's a challenge.
What if something fails to start, fails to stop, or otherwise breaks.
The patch that I posted deals with restarts by assuming that if
something is running and processes requests, it is a healthy service.
Thus it does not need to kill anything ever, in theory. In practice,
however, a partial failure is possible that requires the user to
kill processes, remove some some directories, or both. So to an
extent the promise of autostart is a lie. I'm not comfortable with it.

Now, a few specific comments on this patch:
 http://marc.info/?l=hail-devel&m=128622349329834&w=2

#1: We use an explicit -a switch to launch autostart (plan the
same on iwhd too). This is done to awoid confusing printouts and
spawning anything if a user runs the program without arguments
or with -h.

Same logic applies to -c and -a together.

#2: There is no kill handler (iwhd has none, tabled does DB shutdown
on signal). It's mostly for the concern of not being reliable.

#3: The credentials for accessing services are not established yet.
Everything works without due to implementation deficiencies.

#4: Current directory is used for all the back-end data (tabled
inherits it from iwhd, which is nicely automagic and needs no
configuration). One warning though - do not cd /tmp.

#5: There are a couple of things that need fixing in Hail:

 - We do not delete the state that Hail (e.g. CLD and tabled)
   accumulates, so CLD session locks persist across restarts.
   The cld has to be fixed to clear obsolete locks right away
   instead of letting them exire in a minute.

 - All the constant verbosity that tabled produces even when idle
   is rather annoying, even after being redirected to a log file.
   It must go.

BTW, the patch itself offers one potential advantage to tabled:
removal of magic delays from "make check" scripts. So we may yet
to get it committed, but even so I am questioning if iwhd needs
the autostart.

-- Pete
--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Fedora Clound]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux