Re: F21 Self Contained Change: Remote Journal Logging

Zbigniew Jędrzejewski-Szmek <zbyszek@xxxxxxxxx> · Wed, 23 Apr 2014 04:54:07 +0200



On Tue, Apr 22, 2014 at 06:34:48AM +0200, Lennart Poettering wrote:
> On Wed, 16.04.14 12:46, Bill Nottingham (notting@xxxxxxxx) wrote:
> 
> > Zbigniew Jędrzejewski-Szmek (zbyszek@xxxxxxxxx) said: 
> > > On Mon, Apr 14, 2014 at 04:20:16PM -0400, Bill Nottingham wrote:
> > > > Jaroslav Reznik (jreznik@xxxxxxxxxx) said: 
> > > > > = Proposed Self Contained Change: Remote Journal Logging = 
> > > > > https://fedoraproject.org/wiki/Changes/Remote_Journal_Logging
> > > > > 
> > > > > Change owner(s): Zbigniew Jędrzejewski-Szmek <zbyszek@xxxxxxxxx>
> > > > > 
> > > > > Systemd journal can be configured to forward events to a remote server. 
> > > > > Entries are forwarded including full metadata, and are stored in normal 
> > > > > journal files, identically to locally generated logs. 
> > > > 
> > > > What's the future of gatewayd if this becomes more widely used?
> > >
> > > gatewayd works in pull mode. Here I'm proposing a push model, where the
> > > "client" (i.e. machine generating the logs) pushes logs to the server
> > > at the time of its own chosing. gatewayd is probably better for some use
> > > cases, this for others.
> > 
> > I understand the pull vs push distinction ... I'm just not clear why pull
> > would ever be a model you'd want to use. (vs something like a local cockpit
> > agent.)
> 
> Pull is the only model that scales, since the centralized log infrastructure can
> schedule when it pulls from where and thus do this according to
> available resources. THe push model is prone to logging bursts
> overwhelming log servers if you scale your network up.
How many clients would need to connect simultaneously to overwhelm the
server? And "overwhelm" here would have to mean something like overflowing
the incoming connection queue. The receiver binary doesn't have to actually
read the data from the connections immedately, and things should function
just fine if it takes a minute or two to process data. A typical "overwhelm"
scenario that we might be talking about would be a massive machine restart 
after a power failure. A typical amount of log messages generating during
boot is rather small: less than 1MB on F21. The receiver should be able to process
data at around disk speed, so it should be able to handle *hundreds* of
boot machines without actually developing a delay of more than a few
seconds. In addition, it would be great to add jitter to starting of the
uploader, which would lessen the load on the server anyway.

> I am pretty sure that a pull model should be the default for everything
> we do, and push only be done where realtimish behaviour is desired to do
> live debugging or suchlike.
My biggest gripe with the pull model is the configuration issue
mentioned by mattdm elsewhere in the thread. If I have a few machines
on my home network, some VMs, a notebook or two, it is much easier to
keep the configuration of the receiver stable and configure all hosts
identically to push to it, then the other way around. Especially that
both network addresses and host names change, so it's really hard to
even to tell the receiver where to pull from.  A list of hosts to pull
from residing on the server is bound to become out of date.

> I am pretty sure the push model concept is one of the major weaknesses
> of the BSD syslog protocol.
It's a problem, but mostly because there's very little buffering and
things are mostly synchronous. But anyway, let's get both models
working... I wouldn't be surprised if both find their niches.

Zbyszek
-- 
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct