Re: Power outages!!! help!

hjcho616 <hjcho616@xxxxxxxxx> · Wed, 20 Sep 2017 22:35:49 +0000 (UTC)

# rados list-inconsistent-pg data
["0.0","0.5","0.a","0.e","0.1c","0.29","0.2c"]
# rados list-inconsistent-pg metadata
["1.d","1.3d"]
# rados list-inconsistent-pg rbd
["2.7"]
# rados list-inconsistent-obj 0.0 --format=json-pretty
{
    "epoch": 23112,
    "inconsistents": []
}
# rados list-inconsistent-obj 0.5 --format=json-pretty
{
    "epoch": 23078,
    "inconsistents": []
}
# rados list-inconsistent-obj 0.a --format=json-pretty
{
    "epoch": 22954,
    "inconsistents": []
}
# rados list-inconsistent-obj 0.e --format=json-pretty
{
    "epoch": 23068,
    "inconsistents": []
}
# rados list-inconsistent-obj 0.1c --format=json-pretty
{
    "epoch": 22954,
    "inconsistents": []
}
# rados list-inconsistent-obj 0.29 --format=json-pretty
{
    "epoch": 22974,
    "inconsistents": []
}
# rados list-inconsistent-obj 0.2c --format=json-pretty
{
    "epoch": 23194,
    "inconsistents": []
}
# rados list-inconsistent-obj 1.d --format=json-pretty
{
    "epoch": 23072,
    "inconsistents": []
}
# rados list-inconsistent-obj 1.3d --format=json-pretty
{
    "epoch": 23221,
    "inconsistents": []
}
# rados list-inconsistent-obj 2.7 --format=json-pretty
{
    "epoch": 23032,
    "inconsistents": []
}

Looks like not much information is there.  Could you elaborate on the items you mentioned in find the object?  How do I check metadata.  What are we looking for in md5sum? 

- find the object  :: manually check the objects, check the object metadata, run md5sum on them all and compare. check objects on the nonrunning osd's and compare there as well. anything to try to determine what object is ok and what is bad. 

I tried that Ceph: manually repair object - Ceph methods on PG 2.7 before..Tried 3 replica case, which would result in shard missing, regardless of which one I moved,  2 replica case, hmm... I guess I don't know how long is "wait a bit" is, I just turned it back on after a minute or so, just returns back to same inconsistent message.. =P  Are we looking for entire stopped OSD to map to different OSD and get 3 replica when running stopped OSD again?

Regards,
Hong

    On Wednesday, September 20, 2017 4:47 PM, hjcho616 <hjcho616@xxxxxxxxx> wrote:

 Thanks Ronny.  I'll try that inconsistent issue soon.  

I think the OSD drive that PG 1.28 is sitting on is still ok... just file corruption happened when power outage happened.. =P  
As you suggested, 
cd /var/lib/ceph/osd/ceph-4/current/
tar --xattrs --preserve-permissions -zcvf osd.4.tar.gz 1.28_*
cd /var/lib/ceph/osd/ceph-10/tmposd
mkdir current
chown ceph.ceph current/
cd current/
tar --xattrs --preserve-permissions -zxvf /var/lib/ceph/osd/ceph-4/current/osd.4.tar.gz
systemctl start ceph-osd@8

I created an temp OSD like I did during import time.  Then set the crush reweight to 0.  I noticed current directory was missing. =P So created a current directory and copied content there.

Starting OSD doesn't appear to show any activity.  Is there any other file I need to copy over other than 1.28_head and 1.28_tail directories?

Regards,
Hong

    On Wednesday, September 20, 2017 4:04 PM, Ronny Aasen <ronny+ceph-users@xxxxxxxx> wrote:

    i would only tar the pg you have
      missing objects from, trying to inject older objects when the pg
      is correct can not be good. 

      scrub errors is kind of the issue with only 2 replicas. when you
      have 2 different objects. how to know witch one is correct and
      witch one is bad..

      and as you have read on
      http://ceph.com/geen-categorie/ceph-manually-repair-object/  and
      on
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
      you need to

      - find the pg      ::  rados list-inconsistent-pg [pool]

      - find the problem ::  rados list-inconsistent-obj
      0.6 --format=json-pretty ; give you the object name  look for
      hints to what is the bad object 

        - find the object  :: manually check the objects,
        check the object metadata, run md5sum on them all and compare.
        check objects on the nonrunning osd's and compare there as well.
        anything to try to determine what object is ok and what is bad.

      - fix the problem  :: assuming you find the bad object,
        stop the affected osd with the bad object, remove the object
        manually, restart osd. issue repair command.

      if the rados commands does not give you the info you need to do it
      all manually as on
      http://ceph.com/geen-categorie/ceph-manually-repair-object/

      good luck 

      Ronny Aasen

      On 20.09.2017 22:17, hjcho616 wrote:

        Thanks
            Ronny.

        I decided to try
            to tar everything under current directory.  Is this correct
            command for it?  Is there any directory we do not want in
            the new drive?  commit_op_seq, meta, nosnap, omap?

        tar --xattrs
          --preserve-permissions -zcvf osd.4.tar.gz .

        As far
          as inconsistent PGs... I am running in to these errors.  I
          tried moving one copy of pg to other location, but it just
          says moved shard is missing.  Tried setting 'noout ' and turn
          one of them down, seems to work on something but then back to
          same error.  Currently trying to move to different osd...
          making sure the drive is not faulty, got few of them.. but
          still persisting..  I've been kicking off ceph pg repair PG#,
          hoping it would fix them. =P  Any other suggestion?

        2017-09-20
          09:39:48.481400 7f163c5fa700  0 log_channel(cluster) log [INF]
          : 0.29 repair starts
        2017-09-20
          09:47:37.384921 7f163c5fa700 -1 log_channel(cluster) log [ERR]
          : 0.29 shard 6: soid 0:97126ead:::200014ce4c3.0000028f:head
          data_digest 0x8f679a50 != data_digest 0x979f2ed4 from auth oi
          0:97126ead:::200014ce4c3.0000028f:head(19366'539375
          client.535319.1:2361163 dirty|data_digest|omap_digest s
          4194304 uv 539375 dd 979f2ed4 od ffffffff alloc_hint [0 0])
        2017-09-20
          09:47:37.384931 7f163c5fa700 -1 log_channel(cluster) log [ERR]
          : 0.29 shard 7: soid 0:97126ead:::200014ce4c3.0000028f:head
          data_digest 0x8f679a50 != data_digest 0x979f2ed4 from auth oi
          0:97126ead:::200014ce4c3.0000028f:head(19366'539375
          client.535319.1:2361163 dirty|data_digest|omap_digest s
          4194304 uv 539375 dd 979f2ed4 od ffffffff alloc_hint [0 0])
        2017-09-20
          09:47:37.384936 7f163c5fa700 -1 log_channel(cluster) log [ERR]
          : 0.29 soid 0:97126ead:::200014ce4c3.0000028f:head: failed to
          pick suitable auth object
        2017-09-20
          09:48:11.138566 7f1639df5700 -1 log_channel(cluster) log [ERR]
          : 0.29 shard 6: soid 0:97d5c15a:::100000101b4.00006892:head
          data_digest 0xd65b4014 != data_digest 0xf41cfab8 from auth oi
          0:97d5c15a:::100000101b4.00006892:head(12962'65557
          osd.4.0:42234 dirty|data_digest|omap_digest s 4194304 uv 776
          dd f41cfab8 od ffffffff alloc_hint [0 0])
        2017-09-20
          09:48:11.138575 7f1639df5700 -1 log_channel(cluster) log [ERR]
          : 0.29 shard 7: soid 0:97d5c15a:::100000101b4.00006892:head
          data_digest 0xd65b4014 != data_digest 0xf41cfab8 from auth oi
          0:97d5c15a:::100000101b4.00006892:head(12962'65557
          osd.4.0:42234 dirty|data_digest|omap_digest s 4194304 uv 776
          dd f41cfab8 od ffffffff alloc_hint [0 0])
        2017-09-20
          09:48:11.138581 7f1639df5700 -1 log_channel(cluster) log [ERR]
          : 0.29 soid 0:97d5c15a:::100000101b4.00006892:head: failed to
          pick suitable auth object
        2017-09-20
          09:48:55.584022 7f1639df5700 -1 log_channel(cluster) log [ERR]
          : 0.29 repair 4 errors, 0 fixed

        Latest
          health...

          HEALTH_ERR
            1 pgs are stuck inactive for more than 300 seconds; 1 pgs
            down; 1 pgs incomplete; 9 pgs inconsistent; 1 pgs repair; 1
            pgs stuck inactive; 1 pgs stuck unclean; 68 scrub errors;
            mds rank 0 has failed; mds cluster is degraded; no legacy
            OSD present but 'sortbitwise' flag is not set

          Regards,
          Hong

               On Wednesday,
                  September 20, 2017 11:53 AM, Ronny Aasen
                  <ronny+ceph-users@xxxxxxxx> wrote:

                    On
                      20.09.2017 16:49, hjcho616 wrote:

                        Anyone?  Can
                            this page be saved?  If not what are my
                            options?

                        Regards,
                        Hong

                                  On Saturday, September 16, 2017 1:55
                                  AM, hjcho616 <hjcho616@xxxxxxxxx>
                                  wrote:

                                      Looking
                                        better... working on scrubbing..
                                      HEALTH_ERR
                                        1 pgs are stuck inactive for
                                        more than 300 seconds; 1 pgs
                                        incomplete; 12 pgs inconsistent;
                                        2 pgs repair; 1 pgs stuck
                                        inactive; 1 pgs stuck unclean;
                                        109 scrub errors; too few PGs
                                        per OSD (29 < min 30); mds
                                        rank 0 has failed; mds cluster
                                        is degraded; noout flag(s) set;
                                        no legacy OSD present but
                                        'sortbitwise' flag is not set

                                      Now
                                        PG1.28.. looking at all old osds
                                        dead or alive.  Only one with
                                        DIR_* directory is in osd.4.  
                                        This appears to be metadata
                                        pool!  21M of metadata can be
                                        quite a bit of stuff.. so I
                                        would like to rescue this!  But
                                        I am not able to start this OSD.
                                         exporting through
                                        ceph-objectstore-tool appears to
                                        crash.  Even with
                                        --skip-journal-replay and
                                        --skip-mount-omap (different
                                        failure).  As I mentioned in
                                        earlier email, that exception
                                        thrown message is bogus...
                                      #
                                        ceph-objectstore-tool --op
                                        export --pgid 1.28  --data-path
                                        /var/lib/ceph/osd/ceph-4
                                        --journal-path
                                        /var/lib/ceph/osd/ceph-4/journal
                                        --file ~/1.28.export
                                      terminate
                                        called after throwing an
                                        instance of 'std::domain_error'

                    [SNIP]

                                        What
                                          can I do to save that PG1.28?
                                           Please let me know if you
                                          need more information.  So
                                          close!... =)

                                          Regards,
                                          Hong

                    12 inconsistent and 109 scrub errors is
                      something you should fix first of all. 

                    also you can consider using the paid-services
                      of many ceph support companies. that specialize in
                      these kind of situations. 

                    --
                    that beeing said, here are some suggestions...

                    when it comes to lost object recovery you have
                      come about as far as i have ever experienced. so
                      everything after here is just assumptions and wild
                      guesswork to what you can try.  I hope others
                      shouts out if i tell you wildly wrong things. 

                    if you have found date pg1.28 from the broken
                      osd and have checked all other working and
                      nonworking drives, for that pg. then you need to
                      try and extract the pg from the broken drive. As
                      always in recovery cases, take a dd clone of the
                      drive and work from the cloned image. to avoid
                      more damage to the drive, and to allow you to try
                      multiple times.

                    you should add a temporary injection drive
                      large enough for that pg, and set its crush weight
                      to 0 so it always drains. make sure it is up and
                      registered properly in ceph. 

                    the idea is to copy the pg manually from
                      broken-osd to the injection drive, since the
                      export/import fails.. making sure you get all
                      xattrs included.  one can either copy the whole
                      pg, or just the "missing" objects.  if there are
                      few objects i would go for that, if there are many
                      i would take the whole pg. you wont get data from
                      leveldb. so i am not at all sure this would work.
                      but worth a shot.

                    - stop your injection osd, verify it is down
                      and the proccess not running.

                      - from the mountpoint of your broken-osd go into
                      the current directory. and tar up the pg1.28 make
                      sure you use -p and --xattrs when you create the
                      archive. 

                      - if tar errors out on unreadable files, just rm
                      those (since you are working on a copy of your
                      rescue image, you can allways try again)

                      - copy the tar file to the injection drive and
                      extract while sitting in the current directory
                      (remember --xattrs)

                      - set debug options on the injection drive in
                      ceph.conf

                      - start the injection drive, and follow along in
                      the log file. hopefully it should scan, locate the
                      pg, and replicate the pg1.28 objects off to the
                      current primary drive for pg1.28. and since it
                      have crush weight 0 it should drain out.

                      - if that works, verify the injection drive is
                      drained, stop it and remove it from ceph.  zap the
                      drive. 

                    this is all as i said guesstimates so your
                      mileage may vary

                      good luck
                    Ronny Aasen

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com