On Thu, Dec 10, 2020 at 1:07 PM Benjamin Berg <bberg@xxxxxxxxxx> wrote: > > Hi, > > On Thu, 2020-12-10 at 12:20 -0700, Chris Murphy wrote: > > On Thu, Dec 10, 2020 at 5:40 AM Benjamin Berg <bberg@xxxxxxxxxx> > > wrote: > > > Hi, > > > > > > so, the other day we had a major regression in the PAM stack[1] > > > that, > > > unfortunately, ended up hitting rawhide and the Fedora 33 testing > > > (not > > > stable) repository before being unpushed. > > > > > > In this case it was easy to work around as SSH was still working > > > fine. > > > But, it seems that rescue mode requires having a root password set, > > > which we do not always do during the Fedora install. > > > > > > > > > So, I think we should have an obvious way for users to enter > > > recovery > > > mode even with a locked root account. > > > > > > Currently rescue.service is executing "systemd-sulogin-shell" which > > > in > > > turn runs "sulogin" (part of util-linux). A workaround is to > > > set SYSTEMD_SULOGIN_FORCE=1 in rescue.service, but that just > > > disables > > > authentication entirely. > > > > > > I suppose to improve this, we would need a kind of "sudologin" that > > > accepts any user in the "wheel" group. Or maybe some other more > > > rigid > > > requirement like configuring the first admin user that was created. > > > > > > Anyone has a good idea on how to solve this? > > > > I solve it with early debug shell using boot param > > systemd.debug-shell=1 but that presents a root login on tty9 without > > needing a password. > > Yeah, if you are able to modify the command line and have the > background, then it is really simple to bypass the authentication. > > > I'm under the impression authentication services aren't even available > > for emergency or rescue targets (?). I also wonder what happens if we > > move to systemd-homed and whether that can start sooner and provide > > the ability to use rescue target? Or if it starts late enough that it > > can't be used for rescue and then also what that means for non-root > > use of rescue because with systemd-home, there are no (human) users in > > /etc at all. > > True, systemd-homed could be a problem. > > Maybe at the end of the day this is a lost cause? > > I mean, if you need to drop into rescue mode, you already need to have > quite in-depth knowledge. So it could be better to focus on having more > versatile solutions. Like being able to revert back to a known good > state of the OS instead of providing a rescue shell. There is also the sysroot fails to mount problem. That leaves us in the initramfs which is an even more limited environment. For sure falling over at boot or during startup is rare, but no matter why it often induces panic in even experienced users, in part because it's rare. rpm-ostree has a way to mostly solve the problem if the startup failure is isolated to a particular deployment. But it could still have the rare case where it falls over in the initramfs. So that's a hole that would be nice to fix because it's something all Fedora editions and spins could fall into. There's a wish list item / idea for a recovery partition from which a system could be booted. Maybe it's a limited "netintsall" kind of environment, to keep it space efficient. (While it's in the Fedora Btrfs tracker, it doesn't mean system root must be Btrfs.) https://pagure.io/fedora-btrfs/project/issue/23 And also a couple of Btrfs specific snapshot-rollback ideas https://pagure.io/fedora-btrfs/project/issue/18 https://pagure.io/fedora-btrfs/project/issue/31 A bit more tangentially related is can we make it easy and cheap for folks to backup consistently so that a reset is less painful? This is neat but probably a hard sell to actually depend on most users opting into, however good of an idea it is to back up regularly. https://pagure.io/fedora-btrfs/project/issue/12 There are other ways boot+startup can fail other than a regression in a package, we kinda need to look at all of them and see if it's possible to take a holistic approach that solves a large chunk of them at once. It's one reason why I'm not pushing hard for /boot on Btrfs, because we don't need another option just to have another option. There are actually good reasons to put /boot on Btrfs no matter what the sysroot file system is, so if there's a way to "standardize" regardless of what that is, the better off we are. But if not /boot on Btrfs we need some other way to deal with the disconnect on rollback between the kernels on /boot and the possibly older modules on an older sysroot snapshot. I personally am gravitating toward the idea of not updating the currently running OS (sometimes called transactional system updates) where if we had a way to test the out-of-band updated OS, like in a container or VM, and only if it passes do we make it the next active system at reboot time. There's some complexities there but also rpm-ostree has learned a lot of those lessons that maybe we wouldn't have to relearn. This might make it possible to avoid the need for a rollback. If the update fails or fails to work, just throw away that system root. -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx