Re: how to let systemd hibernate start/stop the swap area?

Lennart Poettering <lennart@xxxxxxxxxxxxxx> · Fri, 31 Mar 2023 14:55:52 +0200

On Fr, 31.03.23 21:54, Michael Chapman (mike@xxxxxxxxxxxxxxxxx) wrote:

> > because otherwise you just remove the latencies from anonymous memory
> > but you amplify the latencies on file-backed memory. Which is overall
> > worse, not better.
>
> The host isn't doing much IO. Just a bit of logging really.

IO is not just writing stuff. If you run some OS a lot more IO is
generated by the fact that ELF binaries are mapped into memory and
then paged in as they run than by generating a bit of log entries.

By saying "hey, never page out anonymous memory!" to the kernel (by
not having swap), you basically say "but please page out file-backed
memory even more, please please, go ahead, now".

> How would the
> existence of swap effect that? Is it really so much better to be able to
> log messages just that little bit faster, but you've now got to wait for
> `sshd` to swap back in whenever you SSH to the system?

Presumably your system mmaps ELF binaries, VM images, and similar
stuff into memory. if you don't allow anonymous memory to backed out
onto swap, then you basically telling the kernel "please page out
my program code out instead". Which is typically a lot worse.

That's why I am saying that yeah, if you want zero IO then that's OK,
but in that case you want *neither* anonymous memory being backed by
disk swap *nor* file-backed memory backed by disk file systems. But
you made the strange choice of saying "IO by file-backed memory is
good", but "IO by anonymous memory" is bad, and then allow the former
and forbid the latter.

hence my question: do you run your OS from an in-memory file system of
some kind? because if not you just shift around what gets paged out,
and because you make the pool of reclaimable memory smaller you
increase IO.

> > > I know this works because I have literally done it on many, many
> > > hypervisors for over a decade.
> >
> > I mean, you have a point: if you run on idle machines where hardware
> > is so massively oversized for the job you are doing, you can operate
> > really nicely without swap. No doubt. But that's kinda
> > wasteful. Resource-management through oversized hw is certainly a way to
> > solve problems, no doubt.
>
> The alternative would be to _overprovision_ the server -- i.e. put more
> VMs on it than it can support. That would just be stupid.

Well, in larger environments the goal is typically to saturate all
hosts, but not overload them. i.e. maximizing your ROI. No need to
fall from one extreme into the other. Today's Linux can actually
achieve something like this, if you use it properly. Swap is part of
using it "properly".

Oversized hw is typically a bad investment. In particular in today's
cloud world where costs multiply with every node you have.

Lennart

--
Lennart Poettering, Berlin