Re: Suicide

Zenon Panoussis <oracle@xxxxxxxxxxxxxxx> · Sat, 16 Apr 2011 11:53:27 +0200

On 04/16/2011 02:00 AM, Gregory Farnum wrote:

> You'll need to add debug output to your MDS config. At a minimum we will 
> need "debug journaler = 20". You should also add "debug ms = 1" and probably 
> "debug mds = 10". 

http://ceph.newdream.net/wiki/Debugging puts "debug journal" in the osd
section and "debug journaler" in the userspace clients section, but I don't
have any userspace clients; only the kernel modules. Just to make 100% sure
that I get it right, which debug levels should I put in which section(s)?

> Be warned that this will use a LOT of disk space, though. If you ran out before 
> you're going to do so again and we will really need the logs that generated the 
> journal and the logs that were replaying the journal to figure out what happened, 
> so you'll need to come up with some way of handling them (writing to a big NFS 
> disk -- though that'll impact networking, different disk, log rotation, etc). 
> Then try and reproduce your previous conditions as exactly as possible...

The two most probable causes of death were (a) irresponsivness because of
the load and (b) the filling up of the log partition.

Now, if it was the filling up of the logs and I now put them where they won't
fill up, then the error won't be reproduced. Then again, if I put them where
they will fill up, we won't have any node02 logs after they fill up, which is
precisely those we need.

On the other hand, if it was irresponsiveness that caused it, more logging
will lead to more I/O, more trashing of that poor 2,5" drive and yet higher
load, so we might run into the error before the logs fill up.

Z

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html