Verily I say unto thee, that Jeff Spaleta spake thusly: > In an effort to chart a new course of constructive discussion... is > it worth brainstorming a bit about how to make rescue mode better or > more accessible? The current rescue mode is certainly sufficient for experienced admins, however it would be a good idea to implement some helper scripts, and possibly even a fluxbox minimal environment. The latter would be especially useful to facilitate administering LVM via system-config-lvm, as I must admit the lvm command syntax is still a mystery to me. The logical procedure should be, identify (as far as possible) what *can* go wrong, think about how *you* would fix it, see if there's any way to (semi)automate that process with helper scripts, and compare that with what's currently available in the rescue environment. Off the top of my head, I'd suggest: 1) Enable installing an immutable rescue partition, and add as a grub entry. 2) Add a minimal graphical environment. 3) Add a "Rescue Install" to Anaconda. 4) Add the various system-config-* helpers. 5) Have a dedicated RPM rescue tool, since this is a special case. I.e. is rpm + all deps correctly installed, are there stale locks, sanity check on the database, etc. 6) Anaconda suggests a backup partition, or asks for a network backup location, and sets up a cron job (SafeKeep?). I.e. push hard to make backup mandatory(ish). I'd also suggest Disk Druid, etc., pushes the suggestion of LVM *and* a snapshot partition, which is IMHO essential. You could do some checks to see if the default root system is bootable, etc., then automatically fall back to rescue mode if not (GRUB patch?), rather than allow the init to proceed then fail. This is essential on a headless server, where it's "stuck" and you can't ssh in to see why. If the idea of a GUI doesn't appeal to you (and for network admins it probably doesn't), I'd suggest the implementation of a ncurses interface for some of the helper tools (long term). As a side note, though not directly related to "rescue", I advocate that yum should be patched to enable partial-failure, i.e. "update as much as possible, root notify failures". I understand it is not a popular theory, but broken deps/repos break automatic updates completely, rather than partially, which could be a problem, e.g. on a large network (like mine) where an essential security update (and all other updates) are not deployed, simply because of *one* broken, and non-essential, package. This just doesn't make any logical sense, and could be an issue for those relying on automated mass system updates. Anyway, back on topic, let's say *I* ask *you*, my sysadmin, to fix the following. What would you (i.e. the script) need to do to (semi)automate this? Not all of these *have* solutions, that can be implemented in software, but even the *hardware* issues could be given more verbose notification/suggestions: 1) swapon ... won't activate, because the swap drive is dead, but this is a low memory system set to automatically boot into X. 2) root filesystem mount failure. 3) Missing/corrupt initrd/bzimage. 4) Missing/non-funtional SCSI/IDE drivers in an *updated* kernel, so cannot mount root filesystem (but previous kernel works). 5) service <foobar> segfaults and halts init. 6) service <foobar> has (missing files | other problem) and waits forever (does not detach to daemon). (hint for 5 and 6 - watchdog timer) 7) Initscripts are b0rked, typo, non-fatal error, etc. (I recently caught one, still unresolved, nfs mountd problem). Why is this needed for rescue mode? Because not all startup errors are noticed by the (unobservant | people who blink a lot). :) A way of running through ($chroot)/init.d in rescue mode looking for non-zero return codes, and suggesting updates/workarounds etc., would be handy. But maybe this is stretching "rescue mode" a little too far. 8) RPM is b0rked. How do I reinstall RPM ... without RPM??? Cyclic dependency error 101: Arrrrggghh! 9) Again, maybe stretching "rescue" too far, but how about fslint in rescue mode, to clean up all those "#PRELINK", "foobar~", and other junk. Especially on a monolithic install (all under /) where /tmp is full. 10) Only other thing I can think of is, SMART disk health checks, however, according to Google's recent report (they did a massive test), SMART is next to useless at actually predicting failure. That's it. I'm sure 99% of the above is useless, but hey ... that's why they call it brainstorming :) -- K. http://slated.org - Slated, Rated & Blogged .---- | "Future archaeologists will be able to identify a 'Vista Upgrade | Layer' when they go through our landfill sites" - Sian Berry, the | Green Party. `---- Fedora Core release 5 (Bordeaux) on sky, running kernel 2.6.19-1.2288.fc5 21:32:25 up 3 days, 8:57, 2 users, load average: 0.26, 0.31, 0.27 -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list