Re: Power outages!!! help!

hjcho616 <hjcho616@xxxxxxxxx> · Thu, 28 Sep 2017 16:53:04 +0000 (UTC)

Yay! Finally after about exactly one month I finally am able to mount the drive!  Now is time to see how my data is doing. =P  Doesn't look too bad though.

Got to love the open source. =)  I downloaded ceph source code.  Built them.  Then tried to run ceph-objectstore-export on that osd.4.   Then started debugging it.  Obviously don't have any idea of what everything do... but was able to trace to the error message.  The corruption appears to be at the mount region.  When it tries to decode a buffer, most buffers had very periodic (looking at the printfs I put in) access to data but then few of them had huge number.  Oh that "1" that didn't make sense was from the corruption happened, and that struct_v portion of the data changed to ASCII value of 1, which happily printed 1. =P  Since it was a mount portion... and hoping it doesn't impact the data much... went ahead and allowed those corrupted values.  I was able to export osd.4 with journal!

Then imported that page..  But OSDs wouldn't take them.. as it decided to create empty page 1.28 and assigned them active.  So.. just as "Incomplete PGs Oh My!" page sugeested,pulled those osds down and removed those empty heads and started back up.  At that point, no more incomplete data!

Working on that inconsistent data. looks like this is somewhat new in the 10.2s.  I was able to get it working with rados get and put and deep-scrub.
https://www.spinics.net/lists/ceph-users/msg39063.html

At this point, everything was active+clean.  But MDS wasn't happy.  Seems to suggest journal is broke.
HEALTH_ERR mds rank 0 is damaged; mds cluster is degraded; no legacy OSD present but 'sortbitwise' flag is not set

Found this. Did everything down to cephfs-table-tool all reset session
http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/

Restarted MDS.  
HEALTH_WARN no legacy OSD present but 'sortbitwise' flag is not set

Mounted!  Thank you everyone for the help!  Learned alot!

Regards,
Hong

    On Friday, September 22, 2017 1:01 AM, hjcho616 <hjcho616@xxxxxxxxx> wrote:

 Ronny,

Could you help me with this log?  I got this with debug osd=20 filestore=20 ms=20.  This one is running "ceph pg repair 2.7"  This is one of the smaller page, thus log was smaller.  Others have similar errors.  I can see the lines with ERR, but other than that is there something I should be paying attention to? 
https://drive.google.com/file/d/0By7YztAJNGUWNkpCV090dHBmOWc/view?usp=sharing

Error messages looks like this.
2017-09-21 23:53:31.545510 7f51682df700 -1 log_channel(cluster) log [ERR] : 2.7 shard 2: soid 2:e17dbaf6:::rb.0.145d.2ae8944a.0000000000bb:head data_digest 0x62b74a1f != data_digest 0x43d61c5d from auth oi 2:e17dbaf6:::rb.0.145d.2ae8944a.0000000000bb:head(12962'694 osd.2.0:90545 dirty|data_digest|omap_digest s 4194304 uv 484 dd 43d61c5d od ffffffff alloc_hint [0 0])
2017-09-21 23:53:31.545520 7f51682df700 -1 log_channel(cluster) log [ERR] : 2.7 shard 7: soid 2:e17dbaf6:::rb.0.145d.2ae8944a.0000000000bb:head data_digest 0x62b74a1f != data_digest 0x43d61c5d from auth oi 2:e17dbaf6:::rb.0.145d.2ae8944a.0000000000bb:head(12962'694 osd.2.0:90545 dirty|data_digest|omap_digest s 4194304 uv 484 dd 43d61c5d od ffffffff alloc_hint [0 0])
2017-09-21 23:53:31.545531 7f51682df700 -1 log_channel(cluster) log [ERR] : 2.7 soid 2:e17dbaf6:::rb.0.145d.2ae8944a.0000000000bb:head: failed to pick suitable auth object

I did try to move that object to different location as suggested from this page.
http://ceph.com/geen-categorie/ceph-manually-repair-object/

This is what I ran.
systemctl stop ceph-osd@7
ceph-osd -i 7 --flush-journal
cd /var/lib/ceph/osd/ceph-7
cd current/2.7_head/
mv rb.0.145d.2ae8944a.0000000000bb__head_6F5DBE87__2 ~/
ceph osd tree
systemctl start ceph-osd@7
ceph pg repair 2.7

Then I just get this..
2017-09-22 00:41:06.495399 7f22ac3bd700 -1 log_channel(cluster) log [ERR] : 2.7 shard 2: soid 2:e17dbaf6:::rb.0.145d.2ae8944a.0000000000bb:head data_digest 0x62b74a1f != data_digest 0x43d61c5d from auth oi 2:e17dbaf6:::rb.0.145d.2ae8944a.0000000000bb:head(12962'694 osd.2.0:90545 dirty|data_digest|omap_digest s 4194304 uv 484 dd 43d61c5d od ffffffff alloc_hint [0 0])
2017-09-22 00:41:06.495417 7f22ac3bd700 -1 log_channel(cluster) log [ERR] : 2.7 shard 7 missing 2:e17dbaf6:::rb.0.145d.2ae8944a.0000000000bb:head
2017-09-22 00:41:06.495424 7f22ac3bd700 -1 log_channel(cluster) log [ERR] : 2.7 soid 2:e17dbaf6:::rb.0.145d.2ae8944a.0000000000bb:head: failed to pick suitable auth object

Moving from osd.2 results in similar error message, just says missing on top one instead. =P

I was hoping this time would give me a different result as I let one more osd copy one from OSD1 by turning down osd.7 and set noout.  But it doesn't appear to care about that extra data. Maybe only true when size is 3?  Basically since I had most osds alive on OSD1 I was trying to favor data from OSD1. =P

What can I do in this case? According to http://ceph.com/geen-categorie/incomplete-pgs-oh-my/ inconsistent data can be expected with skip journal replay, and I had to use it as export crashed without it. =P  But doesn't say much about what to do in that case.
If all went well, then your cluster is now back to 100% active+clean / HEALTH_OK state. Note that you may still have inconsistent or stale data stored inside the PG. This is because the state of the data on the OSD that failed is a bit unknown, especially if you had to use the ‘–skip-journal-replay’ option on the export. For RBD data, the client which utilizes the RBD should run a filesystem check against the RBD.

Regards,
Hong

    On Thursday, September 21, 2017 1:46 AM, Ronny Aasen <ronny+ceph-users@xxxxxxxx> wrote:

 On 21. sep. 2017 00:35, hjcho616 wrote:
> # rados list-inconsistent-pg data
> ["0.0","0.5","0.a","0.e","0.1c","0.29","0.2c"]
> # rados list-inconsistent-pg metadata
> ["1.d","1.3d"]
> # rados list-inconsistent-pg rbd
> ["2.7"]
> # rados list-inconsistent-obj 0.0 --format=json-pretty
> {
>      "epoch": 23112,
>      "inconsistents": []
> }
> # rados list-inconsistent-obj 0.5 --format=json-pretty
> {
>      "epoch": 23078,
>      "inconsistents": []
> }
> # rados list-inconsistent-obj 0.a --format=json-pretty
> {
>      "epoch": 22954,
>      "inconsistents": []
> }
> # rados list-inconsistent-obj 0.e --format=json-pretty
> {
>      "epoch": 23068,
>      "inconsistents": []
> }
> # rados list-inconsistent-obj 0.1c --format=json-pretty
> {
>      "epoch": 22954,
>      "inconsistents": []
> }
> # rados list-inconsistent-obj 0.29 --format=json-pretty
> {
>      "epoch": 22974,
>      "inconsistents": []
> }
> # rados list-inconsistent-obj 0.2c --format=json-pretty
> {
>      "epoch": 23194,
>      "inconsistents": []
> }
> # rados list-inconsistent-obj 1.d --format=json-pretty
> {
>      "epoch": 23072,
>      "inconsistents": []
> }
> # rados list-inconsistent-obj 1.3d --format=json-pretty
> {
>      "epoch": 23221,
>      "inconsistents": []
> }
> # rados list-inconsistent-obj 2.7 --format=json-pretty
> {
>      "epoch": 23032,
>      "inconsistents": []
> }
> 
> Looks like not much information is there.  Could you elaborate on the 
> items you mentioned in find the object?  How do I check metadata.  What 
> are we looking for in md5sum?
> 
> - find the object  :: manually check the objects, check the object 
> metadata, run md5sum on them all and compare. check objects on the 
> nonrunning osd's and compare there as well. anything to try to determine 
> what object is ok and what is bad.
> 
> I tried that Ceph: manually repair object - Ceph 
> <http://ceph.com/geen-categorie/ceph-manually-repair-object/> methods on 
> PG 2.7 before..Tried 3 replica case, which would result in shard 
> missing, regardless of which one I moved,  2 replica case, hmm... I 
> guess I don't know how long is "wait a bit" is, I just turned it back on 
> after a minute or so, just returns back to same inconsistent message.. 
> =P  Are we looking for entire stopped OSD to map to different OSD and 
> get 3 replica when running stopped OSD again?
> 
> Regards,
> Hong

since your  list-inconsistent-obj is empty, you need to up debugging on 
all osd's and grep the logs to find the objects with issues. this is 
explained in the link.  ceph ph  map [pg]  tells you what osd's to look 
at, and the log will have hints to the reason for the error. keep in 
mind that it can be a while since the scrub errors out, so you may need 
to look at older logs. or trigger a scrub, and wait for it to finish so 
you can check the current log.

once you have the object names you can find them with the find command.

after removing/fixing the broken object, and restaring osd, you issue 
the repair, and wait for the repair and scrub of that pg to finish. you 
can probably follow along by tailing the log.

good luck

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com