Re: Logging framework in Gluter 4.0

Shyam <srangana@xxxxxxxxxx> · Tue, 10 Nov 2015 09:56:32 -0500

On 11/10/2015 01:36 AM, Kaushal M wrote:
On Tue, Nov 10, 2015 at 12:05 PM, Kaushal M <kshlmster@xxxxxxxxx> wrote:
On Fri, Nov 6, 2015 at 10:45 PM, Atin Mukherjee
<atin.mukherjee83@xxxxxxxxx> wrote:
-Atin
Sent from one plus one

On Nov 6, 2015 7:50 PM, "Shyam" <srangana@xxxxxxxxxx> wrote:

On 11/06/2015 06:58 AM, Atin Mukherjee wrote:

On 11/06/2015 01:30 PM, Aravinda wrote:

regards
Aravinda
http://aravindavk.in

On 11/06/2015 12:28 PM, Avra Sengupta wrote:

Hi,

As almost all the components targeted for Gluster 4.0 have moved from
design phase to implementation phase on some level or another, I feel
it's time to get some consensus on the logging framework we are going
to use. Are we going to stick with the message ID formatted logging
framework in use today, or shall we move on to a better solution.

I would prefer to think that we extend the log messages with what we want.
Message IDs are there for a purpose as outlined in earlier mails on the same
topic, so we [w|s]hould stick with message IDs and add more is how I think
of the problem.

If we take inspiration from other frameworks, I would say we improve what
we log.

Where we log is made pluggable in the current framework, although as we
add more logging frameworks (like rsyslog or lumberjack etc.) the
abstraction provided for plugging in these could improve. This is a
contained change in the logging.c file though.

Some good to have features I would like to have in the logging
framework are:
1. Specific distinction between logs of various transactions (or
operations), that would make it a lot easier to make sense than
looking at some serialized log dump

Externally grouping Message IDs may help in this situation.

Transaction Id is definitely a *must have* requirement when 4.0's theme
is scalability where we are talking about number of servers been
thousands in the configuration with out which analysis of an issue will
be a nightmare. GlusterD 2.0 is definitely going to have logs with txn
ids.

+1

I thought this was discussed in another thread on the lists, not able to
find it, a link would help.

The question is, should this be automated by the logging framework, or
something each xlator should do?

I prefer automating this, but how? (frame pointers? at least on the IO
xlator stack) What is glusterd doing for the same?
As of now we have started using logrus pkg in golang which logs every
message with its txn id. How it does is what I need to find out looking at
the pkg code (if you are looking to borrow the similar idea)

The logrus[1] package provides structured logging and ability to build
incremental log contexts. This makes it possible to do things like
message-ids. But IMO the advantage of using structured logging is that
it would kind of remove the need for message-ids.

In structured logging, it encouraged to keep the log string itself
constant and keep the variable portions as metadata attached to the
log, generally as key-value pairs. This allows us to implement things
like message-ids, but it is not really required. As the log string is
kept constant, it becomes easier to search for individual logs and
document them. For example (not really the best example), currently we
log socket failures in the following format

"readv on some.socket failed (No data available)"

This is hard to search for or parse, as the log string itself changes
when the socket or error changes. If using structured logging, this
log would be in the format

"readv on socket failed socket=some.socket errno=ENODATA"

This is much more easier to search and parse.

Using log-contexts allows attaching a certain metadata to all logs in
a context, for example, attaching transaction-ids to all logs of a
transaction. If we were to use log-contexts above example we could
attach the transaction-id to the log,

"readv on socket failed socket=some.socket errno=ENODATA
txn-id=<uuid-string-here>"

allowing us to easily track during which transaction did the read fail.

(Missed the link)
[1]: https://github.com/sirupsen/logrus

The intention of message IDs was to prevent any confusion arising due to 
message string changes. I.e when the message string changes between 
releases, mapping this back to the original trouble shot (metadata) 
provided requires manual intervention, that the message ID takes away.

Now, mandating the message string a constant is another way to generate 
the same. Actually initially the message ID for the catalog was to be 
generated using hashes of message strings, but we were not able to 
assure that the message strings would not change and found duplicate 
message strings (quite a few, I had metrics on this at some point but 
not now).

Also, in gluster there are strings with the same message, but at times 
providing a different context for the message, in these situations again 
the message ID helps.

IMHO the message ID is invariant, the message is not (as we better it 
from time to time) and hence a catalog of the same is very easy to 
maintain and update.

As a side note, the addition of errno to the log API was also to prevent 
the example of what is seen above (the socket error), i.e message string 
is constant, and errno just defines the actual error.

Further, I agree with the <key,value> approach, the intention of masking 
the API (in this case gf_msg) as a series of inargs and not key value 
pairs, was to internally generate any KV pairs as needed. For example 
when doing centralized logging, addition of host details and deamon 
details as KV pairs (say to lumberjack, which I think is JSON format 
log) is possible, and is handled by the framework and not by each 
individual call to the logging.

We can change the API to a KV pair, or make it take an additional KV 
pair list, which has additional contextual information (say transaction 
ID/#) which is transparent to the logging framework, as it does not 
bother with the same.

2. We often encounter issues, and then find ourselves wishing the
process was running in debug mode so that we could have gotten some
debug logs. If there is any solution that enables me tracability of
the process's flow, without compromising the space constraints of the
logs that would be great to have.

I think after each debugging session, we should revise the log messages
to get following answers.
1. Is something missing in log, which could have reduced the time spent
to root cause.
2. Logging is available but imcomplete details. (For example, on volume
set, debug message will have key and value but no Volume name)
3. Is something redundant or not required. (For example, Geo-replication
worker logs sync engine as rsync or tar mode. If we have multiple
workers then all workers prints this log message. Instead if we move
this log to monitor process instead of worker process we could save some
space in log files)

With this practice, we can improve our logs over time.

I do agree here with Aravinda, we can do a far better job even at INFO
log level than what we've been doing now.

This is kind of an open floor question I am putting across to the
community, with no specific solution in mind at this point in time,
but a concern that we should rather decide on something sooner in the
development cycle than later.

Regards,
Avra
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

-Atin
Sent from one plus one

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel