> Debugging will involve getting far more/bigger files from customers > unless we have a script (?) to grep out only those messages pertaining > to the volume in question. IIUC, this would just be grepping for the > volname and then determining which brick each message pertains to > based on the brick id, correct? Correct. There would also be some possibly-interesting messages that aren't specifically tied to any one brick, e.g. in protocol/server or various parts of libglusterfs, so we'd probably always want those no matter what brick(s) we're interested in. > Would brick ids remain constant across add/remove brick operations? An > easy way would probably be just to use the client xlator number as the > brick id which would make it easy to map the brick to client > connection. Brick IDs should be constant across add/remove operations, which I suppose means they'll need to be more globally unique than they are now (client translator indices can clash across volumes). > With several brick processes all writing to the same log file, can > there be problems with interleaving messages? AFAICT the "atom" of logging is a line, so there shouldn't be problems of interleaving within a line ("foo" + "bar" won't become "fboaro"). However, when code tries to log multiple lines together - e.g. DHT layouts or AFR self-heal info - that could end up being interleaved with another brick doing the same. They'd still be distinct according to brick ID, but when looking at an unfiltered log it could look a bit confusing. > Logrotate might kick in faster as well causing us to lose debugging > data if only a limited number of files are saved, as all those files > would now hold less log data per volume. The logrotate config options > would need to be changed to keep more files. True. > Having all messages for the bricks of the same volume in a single file > would definitely be helpful. Still thinking through logging all > messages for all bricks in a single file. :) Something to keep in mind is that multiplexing everything into only one process is only temporary. Very soon, we'll be multiplexing into multiple processes, with the number of processes proportional to the number of cores in the system. So, for a node with 200 bricks and 24 cores, we might have 24 processes each containing ~8 bricks. In that case, it would make sense to keep bricks for the same volume separate as much as possible. A process is a failure domain, and having multiple related bricks in the same failure domain is undesirable (though unavoidable in some cases). The consequence for logging is that you'd still have to look at multiple files. _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel