On Thu, 16 Feb 2023 09:52:27 +0100 Michal Prívozník <mprivozn@xxxxxxxxxx> wrote: > On 2/15/23 19:30, Stefano Brivio wrote: > > On Wed, 15 Feb 2023 18:04:56 +0100 > > Michal Prívozník <mprivozn@xxxxxxxxxx> wrote: > > > >> On 2/15/23 08:50, Laine Stump wrote: > >>> On 2/14/23 8:02 AM, Stefano Brivio wrote: > >>>> On Tue, 14 Feb 2023 12:51:22 +0100 > >>>> Michal Privoznik <mprivozn@xxxxxxxxxx> wrote: > >>>> > >>>>> When passt starts it tries to do some security measures to > >>>>> restrict itself. For instance, it creates its own namespaces, > >>>>> umounts basically everything, drops capabilities, forks off to > >>>>> further restrict itself (the child is where all interesting work > >>>>> takes place now). This is sound, except it's causing two > >>>>> problems: > >>>>> > >>>>> 1) the PID file FD, which we leak into the passt process, gets > >>>>> closed (and thus our virPidFile*() helpers see unlocked PID > >>>>> file, which makes them think the process is gone), > >>>> > >>>> I didn't realise this was the case, but giving passt write (unless I'm > >>>> missing something) access to a file created by libvirtd doesn't look > >>>> desirable to me. > >>> > >>>> > >>>>> 2) the PID file no longer reflects true PID of the process. > >>>>> > >>>>> Worse, the child calls setsid() so we can't even kill the whole > >>>>> process group. I mean, we can but it won't be any good. > >>> > >>> I think that (incorrect PID in the pidfile) is happening because Michal > >>> is using the original version of my patches that were pushed - I had > >>> mimicked the behavior of slirp, where libvirt deamonizes the new > >>> process. If that process then daemonizes itself, we have some sort of > >>> "double daemon"; libvirt has saved off the pid of what it thinks is > >>> going to be the final process, but then that process further forks and > >>> exits from the process whose pid libvirt saved. But because passt was > >>> cleaning up after itself I hadn't noticed the discrepancy in pids when > >>> testing. > >>> > >>> Without going into all the details of the pidfile and locking and etc, I > >>> just want to say that if we can fork/exec dnsmasq and let it daemonize > >>> itself and create its own pidfile, then certainly we can do the same > >>> thing for passt. (and if there's a fundamental problem, then it's a > >>> fundamental problem for dnsmasq as well). > >> > >> Alright. I think I have a solution that would please everybody involved. > >> I'll post it tomorrow though. I need to test it thoroughly. We would be > >> able to get passt's PID (which is needed not only for killing it, but > >> also for CGroup placement), NOT use --foreground and still pass errors > >> from it to users (that is unless logfile was specified, because > >> unfortunately, --log-file and --stderr are mutually exclusive). > > > > That doesn't need to be the case (--log-file and --stderr being > > mutually exclusive)... if you have a use case for it, let's change that > > in passt. I just wanted to keep it simple for users ("give a log file, > > and be sure it won't spam"). > > > > Also mind that Laine's series: > > https://archives.passt.top/passt-dev/20230215082437.110151-1-laine@xxxxxxxxxx/ > > Thanks, this looks exactly like what we need. So for now I can just pass > --stderr if there's no --log-file, to deal with those "releases" that > don't have those patches merged yet. I wouldn't even bother with that, the user base (especially with libvirt) is small enough that we can be quite confident that 100% of the users will upgrade as soon as a new release (most likely coming this week) includes them. I'd suggest to keep it simple, because you really can. Those are actual releases. Their naming just doesn't follow "semantic versioning". > > *should* already cover all the cases where libvirt is interested in > > relaying "early" errors back to the user. > > > > By the way, the one below is pretty much the patch I would have proposed > > for libvirt. I prepared it earlier today and didn't have a chance to > > test it yet, it's compile-tested only, and doesn't take cgroups into > > account (which, it seems, is needed no matter the lifecycle). > > > > So I'm sharing it here as reference (that's how simple I wanted it to > > be -- minus cgroups), or if it's convenient for you to copy and paste > > something. > > This effectively disables placing passt into the CGroup set up for > emulator thread. And I don't think we want that. Firstly, it makes > statistics gathering report incorrect values. Secondly, these helper > processes are "implementation detail" - I mean, users don't really care > (from accounting POV) whether a task runs in emulator thread inside of > QEMU or in a separate process. It's still an emulation and as such > should be accounted for. And also, on NUMA machines we definitely want > to place passt as close to the emulator as possible (i.e. if emulator > thread is pinned than helper processes should be pinned too). Yes, definitely, I see now -- I thought, earlier, that cgroups were just used to handle lifecycles at the moment. > [...] -- Stefano