Re: corrupt pg logs

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 23 Sep 2010 09:03:21 -0700 (PDT)

On Thu, 23 Sep 2010, Leander Yu wrote:

> Hi Sage,
> I have some high level idea about this but I haven't fully trace the
> code so please forgive me if it's too naive
> 
> On Thu, Sep 23, 2010 at 11:05 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > One problem that keeps popping up is corruption in the PG logs.  This
> > usually manifests itself as a crash when the OSD restarts and is unable to
> > parse the log.  There are a couple of things to do here.
> >
> > First, we need to figure out where the corruption is coming from.  Dumps
> > of the corrupt pglog files will help.  Are they zeroed out?  Entirely?  Is
> > there valid data at the end of the file?  etc.
> >
> 
> I am thinking to have a checksum for each log entry, when osd restart
> and parse the log it will be able to detect if the data is corrupt.

Ah, yeah, that would be better.  The encoding format will have to change a 
bit so that there is an entry length preceeding the entry, probably.  
Something like

 <length><entry payload><crc32c>

so that we can validate before trying to decode the entry itself.

> > Second, we need to come up with a reasonable way to start up even when
> > some PGs are corrupt.  Deleting them is one option, but we want to avoid
> > doing so unless we're sure we have a good copy elsewhere.
> >
> 
> By implement the checksum for each log entry, we will be able to
> ignore the corrupted log and hopefully it can be rebuilt.
> This is the place I am not certain, we have deleted the single PGinfo
> that cause the error manually and see that osd start successfully, but
> due to the limited knowledge about current implementation we are not
> sure if everything is rebuilt properly.

You can scrub the specific PG to check for consistency between replicas 
with

$ ceph pg scrub <pgid>     # e.g. 'ceph pg scrub 0.1f3'

The results will show up in the system log (ceph -w to watch).

sage