On 04/16/2011 02:00 AM, Gregory Farnum wrote: > You'll need to add debug output to your MDS config. At a minimum we will > need "debug journaler = 20". You should also add "debug ms = 1" and probably > "debug mds = 10". http://ceph.newdream.net/wiki/Debugging puts "debug journal" in the osd section and "debug journaler" in the userspace clients section, but I don't have any userspace clients; only the kernel modules. Just to make 100% sure that I get it right, which debug levels should I put in which section(s)? > Be warned that this will use a LOT of disk space, though. If you ran out before > you're going to do so again and we will really need the logs that generated the > journal and the logs that were replaying the journal to figure out what happened, > so you'll need to come up with some way of handling them (writing to a big NFS > disk -- though that'll impact networking, different disk, log rotation, etc). > Then try and reproduce your previous conditions as exactly as possible... The two most probable causes of death were (a) irresponsivness because of the load and (b) the filling up of the log partition. Now, if it was the filling up of the logs and I now put them where they won't fill up, then the error won't be reproduced. Then again, if I put them where they will fill up, we won't have any node02 logs after they fill up, which is precisely those we need. On the other hand, if it was irresponsiveness that caused it, more logging will lead to more I/O, more trashing of that poor 2,5" drive and yet higher load, so we might run into the error before the logs fill up. Z -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html