One problem that keeps popping up is corruption in the PG logs. This usually manifests itself as a crash when the OSD restarts and is unable to parse the log. There are a couple of things to do here. First, we need to figure out where the corruption is coming from. Dumps of the corrupt pglog files will help. Are they zeroed out? Entirely? Is there valid data at the end of the file? etc. Second, we need to come up with a reasonable way to start up even when some PGs are corrupt. Deleting them is one option, but we want to avoid doing so unless we're sure we have a good copy elsewhere. Another option would be to make a 'corrupt' subdirectory on the OSD and move the log there. Without the log, the OSD will need to rebuild the object list to recover/resync with other PG copies, but at least it will start and (eventually) recover. http://tracker.newdream.net/issues/169 Thoughts? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html