Re: How to recover from active+clean+inconsistent+failed_repair?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sagara,

I'm not sure if my hypothesis can be correct. Ceph sends an acknowledge of a write only after all copies are on disk. In other words, if PGs end up on different versions after a power outage, one always needs to roll back. Since you have two healthy OSDs in the PG and the PG is active (successfully peered), it might just be a broken disk and read/write errors. I would focus on that.

Another question, do you have write caches enabled (disk cache and controller cache)? This is know to cause problems on power outages and also degraded performance with ceph. You should check and disable any caches if necessary.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 01 November 2020 14:37:41
To: Sagara Wijetunga; ceph-users@xxxxxxx
Subject:  Re: How to recover from active+clean+inconsistent+failed_repair?

sorry: *badblocks* can force remappings of broken sectors (non-destructive read-write check)

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 01 November 2020 14:35:35
To: Sagara Wijetunga; ceph-users@xxxxxxx
Subject:  Re: How to recover from active+clean+inconsistent+failed_repair?

Hi Sagara,

looks like your situation is more complex. Before doing anything potentially destructive, you need to investigate some more. A possible interpretation (numbering just for the example):

OSD 0 PG at version 1
OSD 1 PG at version 2
OSD 2 PG has scrub error

Depending on the version of the PG on OSD 2, either OSD 0 needs to roll forward (OSD 2 PG at version 2), or OSD 1 needs to roll back (OSD 2 PG at version 1). Part of the relevant information on OSD 2 seems to be unreadable, therefore pg repair bails out.

You need to find out if you are in this situation or some other case. If you are, you need to find out somehow if you need to roll back or forward. I'm afraid in your current situation, even taking the OSD with the scrub errors down will not rebuild the PG.

I would probably try:

- find out with smartctl if the OSD with scrub errors is in a pre-fail state (has remapped sectors)
- if it is:
  * take it down and try to make a full copy with ddrescue
  * if ddrescure manages to copy everything, copy back to a new disk and add to ceph
  * if ddrescue fails to copy everything, you could try if badblocks manages to get the disk back; ddrescue can force remappings of broken sectors (non-destructive read-write check) and it can happen that data becomes readable again, exchange the disk as soon as possible thereafter
- if the disk is healthy:
  * try to find out if you can deduce the state of the copies on every OSD

The tool for low-level operations is bluestore-tool. I never used it, so you need to look at the documentation.

If everything fails, I guess your last option is to decide for one of the copies, export it from one OSD and inject it to another one (but not any of 0,1,2!). This will establish 2 identical copies and the third one will be changed to this one automatically. Note that this may lead to data loss on objects that were in the undefined state. As far as I can see, its only 1 object and probably possible to recover from (backup, snapshot).

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Sagara Wijetunga <sagarawmw@xxxxxxxxx>
Sent: 01 November 2020 14:05:36
To: ceph-users@xxxxxxx
Subject:  Re: How to recover from active+clean+inconsistent+failed_repair?

Hi Frank

Thanks for the reply.

> I think this happens when a PG has 3 different copies and cannot decide which one is correct. You might have hit a very rare case. You should start with the scrub errors, check which PGs and which copies (OSDs) are affected. It sounds almost like all 3 scrub errors are on the same PG.
Yes, all 3 errors are for the same PG and on the same OSD:
2020-11-01 18:25:09.333339 osd.0 [ERR] 3.b shard 2 soid 3:d577e975:::1000023675e.00000000:head : candidate had a missing snapset key, candidate had a missing info key
2020-11-01 18:25:09.333342 osd.0 [ERR] 3.b soid 3:d577e975:::1000023675e.00000000:head : failed to pick suitable object info
2020-11-01 18:26:33.496255 osd.0 [ERR] 3.b repair 3 errors, 0 fixed

> You might have had a combination of crash and OSD fail, your situation is probably not covered by "single point of failure".
Yes it was a complex crash, all went down.

> In case you have a PG with scrub errors on 2 copies, you should be able to reconstruct the PG from the third with PG export/PG import commands.
I have not done a PG export/import before. Mind if you could send the instructions or a link for it.

Thanks
Sagara
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux