On Thu, 23 Sep 2010, Leander Yu wrote: > Hi Sage, > I have some high level idea about this but I haven't fully trace the > code so please forgive me if it's too naive > > On Thu, Sep 23, 2010 at 11:05 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > One problem that keeps popping up is corruption in the PG logs. This > > usually manifests itself as a crash when the OSD restarts and is unable to > > parse the log. There are a couple of things to do here. > > > > First, we need to figure out where the corruption is coming from. Dumps > > of the corrupt pglog files will help. Are they zeroed out? Entirely? Is > > there valid data at the end of the file? etc. > > > > I am thinking to have a checksum for each log entry, when osd restart > and parse the log it will be able to detect if the data is corrupt. Ah, yeah, that would be better. The encoding format will have to change a bit so that there is an entry length preceeding the entry, probably. Something like <length><entry payload><crc32c> so that we can validate before trying to decode the entry itself. > > Second, we need to come up with a reasonable way to start up even when > > some PGs are corrupt. Deleting them is one option, but we want to avoid > > doing so unless we're sure we have a good copy elsewhere. > > > > By implement the checksum for each log entry, we will be able to > ignore the corrupted log and hopefully it can be rebuilt. > This is the place I am not certain, we have deleted the single PGinfo > that cause the error manually and see that osd start successfully, but > due to the limited knowledge about current implementation we are not > sure if everything is rebuilt properly. You can scrub the specific PG to check for consistency between replicas with $ ceph pg scrub <pgid> # e.g. 'ceph pg scrub 0.1f3' The results will show up in the system log (ceph -w to watch). sage