Re: Power outages!!! help!

hjcho616 <hjcho616@xxxxxxxxx> · Sat, 2 Sep 2017 04:10:25 +0000 (UTC)

Just realized there is a file called superblock in the ceph directory.  ceph-1 and ceph-2's superblock file is identical, ceph-6 and ceph-7 are identical, but not between the two groups.  When I originally created the OSDs, I created ceph-0 through 5.  Can superblock file be copied over from ceph-1 to ceph-0?

Hmm.. it appears to be doing something in the background even though osd.0 is down.  ceph health output is changing!
# ceph health
HEALTH_ERR 40 pgs are stuck inactive for more than 300 seconds; 14 pgs backfill_wait; 21 pgs degraded; 10 pgs down; 2 pgs inconsistent; 10 pgs peering; 3 pgs recovering; 2 pgs recovery_wait; 30 pgs stale; 21 pgs stuck degraded; 10 pgs stuck inactive; 30 pgs stuck stale; 45 pgs stuck unclean; 16 pgs stuck undersized; 16 pgs undersized; 2 requests are blocked > 32 sec; recovery 221826/2473662 objects degraded (8.968%); recovery 254711/2473662 objects misplaced (10.297%); recovery 103/2251966 unfound (0.005%); 7 scrub errors; mds cluster is degraded; no legacy OSD present but 'sortbitwise' flag is not set

Regards,
Hong

    On Friday, September 1, 2017 10:37 PM, hjcho616 <hjcho616@xxxxxxxxx> wrote:

 Tried connecting recovered osd.  Looks like some of the files in the lost+found are super blocks.  Below is the log.  What can I do about this?

2017-09-01 22:27:27.634228 7f68837e5800  0 set uid:gid to 1001:1001 (ceph:ceph)
2017-09-01 22:27:27.634245 7f68837e5800  0 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd, pid 5432
2017-09-01 22:27:27.635456 7f68837e5800  0 pidfile_write: ignore empty --pid-file
2017-09-01 22:27:27.646849 7f68837e5800  0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2017-09-01 22:27:27.647077 7f68837e5800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-09-01 22:27:27.647080 7f68837e5800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-09-01 22:27:27.647091 7f68837e5800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice is supported
2017-09-01 22:27:27.678937 7f68837e5800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2017-09-01 22:27:27.679044 7f68837e5800  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf
2017-09-01 22:27:27.680718 7f68837e5800  1 leveldb: Recovering log #28054
2017-09-01 22:27:27.804501 7f68837e5800  1 leveldb: Delete type=0 #28054

2017-09-01 22:27:27.804579 7f68837e5800  1 leveldb: Delete type=3 #28053

2017-09-01 22:27:35.586725 7f68837e5800  0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2017-09-01 22:27:35.587689 7f68837e5800  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 18: 9998729216 bytes, block size 4096 bytes, directio = 1, aio = 1
2017-09-01 22:27:35.589631 7f68837e5800  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 18: 9998729216 bytes, block size 4096 bytes, directio = 1, aio = 1
2017-09-01 22:27:35.590041 7f68837e5800  1 filestore(/var/lib/ceph/osd/ceph-0) upgrade
2017-09-01 22:27:35.590149 7f68837e5800 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or directory
2017-09-01 22:27:35.590158 7f68837e5800 -1 osd.0 0 OSD::init() : unable to read osd superblock
2017-09-01 22:27:35.590547 7f68837e5800  1 journal close /var/lib/ceph/osd/ceph-0/journal
2017-09-01 22:27:35.611595 7f68837e5800 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m

Recovered drive is mounted on /var/lib/ceph/osd/ceph-0.
# df
Filesystem      1K-blocks      Used  Available Use% Mounted on
udev                10240         0      10240   0% /dev
tmpfs             1584780      9172    1575608   1% /run
/dev/sda1        15247760   9319048    5131120  65% /
tmpfs             3961940         0    3961940   0% /dev/shm
tmpfs                5120         0       5120   0% /run/lock
tmpfs             3961940         0    3961940   0% /sys/fs/cgroup
/dev/sdb1      1952559676 634913968 1317645708  33% /var/lib/ceph/osd/ceph-0
/dev/sde1      1952559676 640365952 1312193724  33% /var/lib/ceph/osd/ceph-6
/dev/sdd1      1952559676 712018768 1240540908  37% /var/lib/ceph/osd/ceph-2
/dev/sdc1      1952559676 755827440 1196732236  39% /var/lib/ceph/osd/ceph-1
/dev/sdf1       312417560  42538060  269879500  14% /var/lib/ceph/osd/ceph-7
tmpfs              792392         0     792392   0% /run/user/0
# cd /var/lib/ceph/osd/ceph-0
# ls
activate.monmap  current  journal_uuid  magic          superblock  whoami
active           fsid     keyring       ready          sysvinit
ceph_fsid        journal  lost+found    store_version  type

Regards,
Hong

    On Friday, September 1, 2017 2:59 PM, hjcho616 <hjcho616@xxxxxxxxx> wrote:

 Found the partition, wasn't able to mount the partition right away... Did a xfs_repair on that drive.  

Got bunch of messages like this.. =(
entry "100000a89fd.00000000__head_AE319A25__0" in shortform directory 845908970 references non-existent inode 605294241               
junking entry "100000a89fd.00000000__head_AE319A25__0" in directory inode 845908970           

Was able to mount.  lost+found has lots of files there. =P  Running du seems to show OK files in current directory.

Will it be safe to attach this one back to the cluster?  Is there a way to specify to use this drive if the data is missing? =)  Or am I being paranoid?  Just plug it? =)

Regards,
Hong

    On Friday, September 1, 2017 9:01 AM, hjcho616 <hjcho616@xxxxxxxxx> wrote:

 Looks like it has been rescued... Only 1 error as we saw before in the smart log!
# ddrescue -f /dev/sda /dev/sdc ./rescue.log
GNU ddrescue 1.21
Press Ctrl-C to interrupt
     ipos:    1508 GB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:    1508 GB, non-scraped:        0 B,  average rate:  88985 kB/s
non-tried:        0 B,     errsize:     4096 B,      run time:  6h 14m 40s
  rescued:    2000 GB,      errors:        1,  remaining time:         n/a
percent rescued:  99.99%      time since last successful read:         39s
Finished                       

Still missing partition in the new drive. =P  I found this util called testdisk for broken partition tables.  Will try that tonight. =P

Regards,
Hong

    On Wednesday, August 30, 2017 9:18 AM, Ronny Aasen <ronny+ceph-users@xxxxxxxx> wrote:

    On 30.08.2017 15:32, Steve Taylor
      wrote:

I'm not familiar with dd_rescue, but I've just been reading
        about it. I'm not seeing any features that would be beneficial
        in this scenario that aren't also available in dd. What specific
        features give it "really a far better chance
        of restoring a copy of your disk" than dd? I'm always interested
        in learning about new recovery tools.

    i see i wrote dd_rescue from old habit, but the package one should
    use on debian is gddrescue or also called gnu ddrecue. 

    this page have some details on the differences on dd vs the ddrescue
    variants. 

    http://www.toad.com/gnu/sysadmin/index.html#ddrescue

    kind regards

    Ronny Aasen

                Steve
                      Taylor | Senior Software Engineer |
                    StorageCraft Technology
                        Corporation

                  380 Data Drive
                    Suite 300 | Draper | Utah | 84020

                    Office: 801.871.2799 | 

        If you are not the intended recipient
                  of this message or received it erroneously, please
                  notify the sender and delete it, together with any
                  attachments, and be advised that any dissemination or
                  copying of this message is prohibited.

        On Tue, 2017-08-29 at 21:49 +0200, Willem Jan Withagen wrote:

        On 29-8-2017 19:12, Steve Taylor wrote:

Hong,

Probably your best chance at recovering any data without special,
expensive, forensic procedures is to perform a dd from /dev/sdb to
somewhere else large enough to hold a full disk image and attempt to
repair that. You'll want to use 'conv=noerror' with your dd command
since your disk is failing. Then you could either re-attach the OSD
from the new source or attempt to retrieve objects from the filestore
on it.

Like somebody else already pointed out
In problem "cases like disk, use dd_rescue.
It has really a far better chance of restoring a copy of your disk

--WjW

I have actually done this before by creating an RBD that matches the
disk size, performing the dd, running xfs_repair, and eventually
adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
temporary arrangement for repair only, but I'm happy to report that it
worked flawlessly in my case. I was able to weight the OSD to 0,
offload all of its data, then remove it for a full recovery, at which
point I just deleted the RBD.

The possibilities afforded by Ceph inception are endless. ☺

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 

If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.

On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:

Rule of thumb with batteries is:
- more “proper temperature” you run them at the more life you get out
of them
- more battery is overpowered for your application the longer it will
survive. 

Get your self a LSI 94** controller and use it as HBA and you will be
fine. but get MORE DRIVES !!!!! … 

On 28 Aug 2017, at 23:10, hjcho616 <hjcho616@xxxxxxxxx> wrote:

Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
try these out.  Car battery idea is nice!  I may try that.. =)  Do
they last longer?  Ones that fit the UPS original battery spec
didn't last very long... part of the reason why I gave up on them..
=P  My wife probably won't like the idea of car battery hanging out
though ha!

The OSD1 (one with mostly ok OSDs, except that smart failure)
motherboard doesn't have any additional SATA connectors available.
 Would it be safe to add another OSD host?

Regards,
Hong

On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmierz@g
mail.com> wrote:

Sorry for being brutal … anyway 
1. get the battery for UPS ( a car battery will do as well, I’ve
moded on ups in the past with truck battery and it was working like
a charm :D )
2. get spare drives and put those in because your cluster CAN NOT
get out of error due to lack of space
3. Follow advice of Ronny Aasen on hot to recover data from hard
drives 
4 get cooling to drives or you will loose more ! 

On 28 Aug 2017, at 22:39, hjcho616 <hjcho616@xxxxxxxxx> wrote:

Tomasz,

Those machines are behind a surge protector.  Doesn't appear to
be a good one!  I do have a UPS... but it is my fault... no
battery.  Power was pretty reliable for a while... and UPS was
just beeping every chance it had, disrupting some sleep.. =P  So
running on surge protector only.  I am running this in home
environment.   So far, HDD failures have been very rare for this
environment. =)  It just doesn't get loaded as much!  I am not
sure what to expect, seeing that "unfound" and just a feeling of
possibility of maybe getting OSD back made me excited about it.
=) Thanks for letting me know what should be the priority.  I
just lack experience and knowledge in this. =) Please do continue
to guide me though this. 

Thank you for the decode of that smart messages!  I do agree that
looks like it is on its way out.  I would like to know how to get
good portion of it back if possible. =)

I think I just set the size and min_size to 1.
# ceph osd lspools
0 data,1 metadata,2 rbd,
# ceph osd pool set rbd size 1
set pool 2 size to 1
# ceph osd pool set rbd min_size 1
set pool 2 min_size to 1

Seems to be doing some backfilling work.

# ceph health
HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2
pgs backfill_toofull; 74 pgs backfill_wait; 3 pgs backfilling;
108 pgs degraded; 6 pgs down; 6 pgs inconsistent; 6 pgs peering;
7 pgs recovery_wait; 16 pgs stale; 108 pgs stuck degraded; 6 pgs
stuck inactive; 16 pgs stuck stale; 130 pgs stuck unclean; 101
pgs stuck undersized; 101 pgs undersized; 1 requests are blocked

32 sec; recovery 1790657/4502340 objects degraded (39.772%);

recovery 641906/4502340 objects misplaced (14.257%); recovery
147/2251990 unfound (0.007%); 50 scrub errors; mds cluster is
degraded; no legacy OSD present but 'sortbitwise' flag is not set

Regards,
Hong

On Monday, August 28, 2017 4:18 PM, Tomasz Kusmierz <tom.kusmierz
@gmail.com> wrote:

So to decode few things about your disk:

  1 Raw_Read_Error_Rate    0x002f  100  100  051    Pre-fail 
Always      -      37
37 read erros and only one sector marked as pending - fun disk
:/ 

181 Program_Fail_Cnt_Total  0x0022  099  099  000    Old_age 
Always      -      35325174
So firmware has quite few bugs, that’s nice

191 G-Sense_Error_Rate      0x0022  100  100  000    Old_age 
Always      -      2855
disk was thrown around while operational even more nice.

194 Temperature_Celsius    0x0002  047  041  000    Old_age 
Always      -      53 (Min/Max 15/59)
if your disk passes 50 you should not consider using it, high
temperatures demagnetise plate layer and you will see more errors
in very near future.

197 Current_Pending_Sector  0x0032  100  100  000    Old_age 
Always      -      1
as mentioned before :)

200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age 
Always      -      4222
your heads keep missing tracks … bent ? I don’t even know how to
comment here.

generally fun drive you’ve got there … rescue as much as you can
and throw it away !!!

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com