How to scale journald correctly?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,

for some of our servers, we wanted to keep more logging data than was
the default (CentOS 7.5, so not the newest version of systemd sadly).

However during those experiments we ran into some quite unexpected
behaviour of journald, which might be bugs, but which might also just be
things that are not documented enough.

I'd obviously love some feedback here.

But first for our goal: We have some machines that are pretty much
failover ready, but only one of them is the master and is running a
bunch of cron jobs that produce quite a bit of log output - which we
would like to retain a bit longer to ease debugging. Still we would like
to have the same JournalD config on all of them.

The problem is that there seems to be no documentation (that we could
find) of how to achieve this and what the limits are that journald is
designed for.

We started with SystemMaxUse=200G. This worked - but after some time we
noticed that `systemctl status $something` became _really_ slow (i.e.
hours to produce output). I'm not entirely sure, but I suspect that this
behavior is documented here:
<https://github.com/systemd/systemd/issues/7963>

To work around this, we continued by increasing SystemMaxFileSize=20G to
limit the amount of files journald would produce. This however led to
the unexpected behaviour of journald exhausting the IO bandwith of the
server it is running on. Documented here:
<https://bugzilla.redhat.com/show_bug.cgi?id=1599658> Interestingly this
setting seemed to work well for some hours, but then suddenly let
JournalD use up all IO resources of the system.

Which begs the question: What are the limits to file sizes, total log
size, log intake rate, ... that journald is engineered to work well
with? What can be reasonable values for these Settings? Where there any
changes in recent releases that could affect this (not that I particular
want to upgrade a core component of my CentOS systems).

I'd love to find answers to these questions, but I'd also love for this
to get into the documentation so later users have an easier time scaling
their journald deployments - should they want to retain more log messages.

Many thanks for your answers,
Martin Häcker


[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux