Hi, this morning I have this situation: health HEALTH_WARN 1540 pgs backfill; 30 pgs backfill_toofull; 113 pgs backfilling; 43 pgs degraded; 38 pgs peering; 5 pgs recovering; 484 pgs recovery_wait; 38 pgs stuck inactive; 2180 pgs stuck unclean; recovery 2153828/21551430 degraded (9.994%); noup,nodown flag(s) set monmap e1: 3 mons at {m1=192.168.21.11:6789/0,m2=192.168.21.12:6789/0,m3=192.168.21.13:6789/0}, election epoch 50, quorum 0,1,2 m1,m2,m3 osdmap e34624: 62 osds: 62 up, 62 in pgmap v1496556: 17280 pgs: 15098 active+clean, 1471 active+remapped+wait_backfill, 9 active+degraded+wait_backfill, 30 active+remapped+wait_backfill+ backfill_toofull, 462 active+recovery_wait, 18 peering, 109 active+remapped+backfilling, 1 active+clean+scrubbing, 30 active+degraded+remapped+wait_backfill, 22 active+recovery_wait+remapped, 20 remapped+peering, 4 active+degraded+remapped+backfilling, 1 active+clean+scrubbing+deep, 5 active+recovering; 50432 GB data, 76489 GB used, 36942 GB / 110 TB avail; 2153828/21551430 degraded (9.994%) mdsmap e52: 1/1/1 up {0=m1=up:active}, 2 up:standby No data movement The cephfs mounts works but many many directories are inaccessible: the clients hangs with just a simple "ls" ceph -w repeat to log these lines: http://pastebin.com/AN01wgfV What can I do to get better? Thanks for your help -- Marco Aroldi 2013/4/22 Marco Aroldi <marco.aroldi@xxxxxxxxx>: > Hey, > Cephfs has became available! > I didn't change the rules > > Do you guys see something "lost" or "abolutely screwed" from these messages? > Do I have only to wait? > I see backfill_tooful: it sound strange because I have set the option > "osd backfill tooful ratio = 0.91" in conf and not one of my osd now > is over that percentage > > Thanks > > The health now is > HEALTH_WARN 2038 pgs backfill; 43 pgs backfill_toofull; 134 pgs > backfilling; 62 pgs degraded; 590 pgs recovery_wait; 2765 pgs stuck > unclean; recovery 2780119/22308143 degraded (12.462%); recovering 42 > o/s, 197MB/s; 5 near full osd(s); noup,nodown flag(s) set > > 2013-04-22 11:33:31.288690 mon.0 [INF] pgmap v1459630: 17280 pgs: > 14512 active+clean, 1945 active+remapped+wait_backfill, 14 > active+degraded+wait_backfill, 36 > active+remapped+wait_backfill+backfill_toofull, 565 > active+recovery_wait, 128 active+remapped+backfilling, 4 > active+remapped+backfill_toofull, 4 active+degraded+backfilling, 3 > active+clean+scrubbing, 37 active+degraded+remapped+wait_backfill, 25 > active+recovery_wait+remapped, 3 > active+degraded+remapped+wait_backfill+backfill_toofull, 4 > active+degraded+remapped+backfilling; 50432 GB data, 76416 GB used, > 37015 GB / 110 TB avail; 2777977/22308143 degraded (12.453%); > recovering 15 o/s, 68099KB/s > > 2013/4/22 Marco Aroldi <marco.aroldi@xxxxxxxxx>: >> In the original design, >> I've change the rules since I would data placed with replica 2 in 2 >> identical room (named p1 and p2) >> Now that 1 room has 4 osd out of cluster, do I have to change the >> rules and use an "type host" rule instead "type room"? >> Could this help? >> >> root default { >> id -1 # do not change unnecessarily >> # weight 122.500 >> alg straw >> hash 0 # rjenkins1 >> item p1 weight 57.500 >> item p2 weight 65.000 >> } >> >> # rules >> rule data { >> ruleset 0 >> type replicated >> min_size 1 >> max_size 10 >> step take default >> step chooseleaf firstn 0 type room >> step emit >> } >> rule metadata { >> ruleset 1 >> type replicated >> min_size 1 >> max_size 10 >> step take default >> step chooseleaf firstn 0 type room >> step emit >> } >> rule rbd { >> ruleset 2 >> type replicated >> min_size 1 >> max_size 10 >> step take default >> step chooseleaf firstn 0 type room >> step emit >> } >> >> # end crush map >> >> >> ceph health: >> >> HEALTH_WARN 2072 pgs backfill; 43 pgs backfill_toofull; 131 pgs >> backfilling; 68 pgs degraded; 594 pgs recovery_wait; 2802 pgs stuck >> unclean; recovery 2811952/22351845 degraded (12.580%); recovering 35 >> o/s, 197MB/s; 4 near full osd(s); noup,nodown flag(s) set >> >> >> 2013-04-22 10:53:26.800014 mon.0 [INF] pgmap v1457213: 17280 pgs: >> 14474 active+clean, 1975 active+remapped+wait_backfill, 18 >> active+degraded+wait_backfill, 37 >> active+remapped+wait_backfill+backfill_toofull, 569 >> active+recovery_wait, 123 active+remapped+backfilling, 3 >> active+remapped+backfill_toofull, 3 active+degraded+backfilling, 6 >> active+clean+scrubbing, 39 active+degraded+remapped+wait_backfill, 25 >> active+recovery_wait+remapped, 3 >> active+degraded+remapped+wait_backfill+backfill_toofull, 5 >> active+degraded+remapped+backfilling; 50432 GB data, 76277 GB used, >> 37154 GB / 110 TB avail; 2811241/22350671 degraded (12.578%); >> recovering 29 o/s, 119MB/s >> >> 2013/4/22 Marco Aroldi <marco.aroldi@xxxxxxxxx>: >>> The rebalance is still going >>> and the mounts are still refused >>> >>> I've re-set the nodown noup flags because the osd are flapping continuously >>> and added in ceph.conf "osd backfill tooful ratio = 0.91", tryin to >>> get rid of all that "backfill_tooful" >>> >>> What I have to to now to regain access? >>> >>> I can provide you any logs or whatever you need >>> Thanks for support >>> >>> in ceph -w I see this: >>> 2013-04-22 09:25:46.601721 osd.8 [WRN] 1 slow requests, 1 included >>> below; oldest blocked for > 5404.500806 secs >>> 2013-04-22 09:25:46.601727 osd.8 [WRN] slow request 5404.500806 >>> seconds old, received at 2013-04-22 07:55:42.100886: >>> osd_op(mds.0.9:177037 10000025d80.000017b3 [stat] 0.300279a9 RETRY >>> rwordered) v4 currently reached pgosd >>> >>> this is the ceph mds dump: >>> >>> dumped mdsmap epoch 52 >>> epoch 52 >>> flags 0 >>> created 2013-03-18 14:42:29.330548 >>> modified 2013-04-22 09:08:45.599613 >>> tableserver 0 >>> root 0 >>> session_timeout 60 >>> session_autoclose 300 >>> last_failure 49 >>> last_failure_osd_epoch 33152 >>> compat compat={},rocompat={},incompat={1=base v0.20,2=client >>> writeable ranges,3=default file layouts on dirs,4=dir inode in >>> separate object} >>> max_mds 1 >>> in 0 >>> up {0=6957} >>> failed >>> stopped >>> data_pools [0] >>> metadata_pool 1 >>> 6957: 192.168.21.11:6800/5844 'm1' mds.0.10 up:active seq 23 >>> 5945: 192.168.21.13:6800/12999 'm3' mds.-1.0 up:standby seq 1 >>> 5963: 192.168.21.12:6800/22454 'm2' mds.-1.0 up:standby seq 1 >>> >>> ceph health: >>> >>> HEALTH_WARN 2133 pgs backfill; 47 pgs backfill_toofull; 136 pgs >>> backfilling; 74 pgs degraded; 1 pgs recovering; 599 pgs recovery_wait; >>> 2877 pgs stuck unclean; recovery 2910416/22449672 degraded (12.964%); >>> recovering 10 o/s, 48850KB/s; 7 near full osd(s); noup,nodown flag(s) >>> set >>> >>> 2013-04-22 09:34:11.436514 mon.0 [INF] pgmap v1452450: 17280 pgs: >>> 14403 active+clean, 2032 active+remapped+wait_backfill, 19 >>> active+degraded+wait_backfill, 35 >>> active+remapped+wait_backfill+backfill_toofull, 574 >>> active+recovery_wait, 126 active+remapped+backfilling, 9 >>> active+remapped+backfill_toofull, 3 active+degraded+backfilling, 2 >>> active+clean+scrubbing, 41 active+degraded+remapped+wait_backfill, 25 >>> active+recovery_wait+remapped, 3 >>> active+degraded+remapped+wait_backfill+backfill_toofull, 8 >>> active+degraded+remapped+backfilling; 50432 GB data, 76229 GB used, >>> 37202 GB / 110 TB avail; 2908837/22447349 degraded (12.958%); >>> recovering 6 o/s, 20408KB/s >>> >>> 2013/4/21 Marco Aroldi <marco.aroldi@xxxxxxxxx>: >>>> Greg, your supposition about the small amount data to be written is >>>> right but the rebalance is writing an insane amount of data to the new >>>> nodes and the mount is not working again >>>> >>>> this is the node S203 (the os is on /dev/sdl, not listed) >>>> >>>> /dev/sda1 1.9T 467G 1.4T 26% /var/lib/ceph/osd/ceph-44 >>>> /dev/sdb1 1.9T 595G 1.3T 33% /var/lib/ceph/osd/ceph-45 >>>> /dev/sdc1 1.9T 396G 1.5T 22% /var/lib/ceph/osd/ceph-46 >>>> /dev/sdd1 1.9T 401G 1.5T 22% /var/lib/ceph/osd/ceph-47 >>>> /dev/sde1 1.9T 337G 1.5T 19% /var/lib/ceph/osd/ceph-48 >>>> /dev/sdf1 1.9T 441G 1.4T 24% /var/lib/ceph/osd/ceph-49 >>>> /dev/sdg1 1.9T 338G 1.5T 19% /var/lib/ceph/osd/ceph-50 >>>> /dev/sdh1 1.9T 359G 1.5T 20% /var/lib/ceph/osd/ceph-51 >>>> /dev/sdi1 1.4T 281G 1.1T 21% /var/lib/ceph/osd/ceph-52 >>>> /dev/sdj1 1.4T 423G 964G 31% /var/lib/ceph/osd/ceph-53 >>>> /dev/sdk1 1.9T 421G 1.4T 23% /var/lib/ceph/osd/ceph-54 >>>> >>>> 2013/4/21 Marco Aroldi <marco.aroldi@xxxxxxxxx>: >>>>> What I can try to do/delete to regain access? >>>>> Those osd are crazy, flapping up and down. I think that the situation >>>>> is without control >>>>> >>>>> >>>>> HEALTH_WARN 2735 pgs backfill; 13 pgs backfill_toofull; 157 pgs >>>>> backfilling; 188 pgs degraded; 251 pgs peering; 13 pgs recovering; >>>>> 1159 pgs recovery_wait; 159 pgs stuck inactive; 4641 pgs stuck >>>>> unclean; recovery 4007916/23007073 degraded (17.420%); recovering 4 >>>>> o/s, 31927KB/s; 19 near full osd(s) >>>>> >>>>> 2013-04-21 18:56:46.839851 mon.0 [INF] pgmap v1399007: 17280 pgs: 276 >>>>> active, 12791 active+clean, 2575 active+remapped+wait_backfill, 71 >>>>> active+degraded+wait_backfill, 6 >>>>> active+remapped+wait_backfill+backfill_toofull, 1121 >>>>> active+recovery_wait, 90 peering, 3 remapped, 1 active+remapped, 127 >>>>> active+remapped+backfilling, 1 active+degraded, 5 >>>>> active+remapped+backfill_toofull, 19 active+degraded+backfilling, 1 >>>>> active+clean+scrubbing, 79 active+degraded+remapped+wait_backfill, 36 >>>>> active+recovery_wait+remapped, 1 >>>>> active+degraded+remapped+wait_backfill+backfill_toofull, 46 >>>>> remapped+peering, 16 active+degraded+remapped+backfilling, 1 >>>>> active+recovery_wait+degraded+remapped, 14 active+recovering; 50435 GB >>>>> data, 74790 GB used, 38642 GB / 110 TB avail; 4018849/23025448 >>>>> degraded (17.454%); recovering 14 o/s, 54732KB/s >>>>> >>>>> # id weight type name up/down reweight >>>>> -1 130 root default >>>>> -9 65 room p1 >>>>> -3 44 rack r14 >>>>> -4 22 host s101 >>>>> 11 2 osd.11 up 1 >>>>> 12 2 osd.12 up 1 >>>>> 13 2 osd.13 up 1 >>>>> 14 2 osd.14 up 1 >>>>> 15 2 osd.15 up 1 >>>>> 16 2 osd.16 up 1 >>>>> 17 2 osd.17 up 1 >>>>> 18 2 osd.18 up 1 >>>>> 19 2 osd.19 up 1 >>>>> 20 2 osd.20 up 1 >>>>> 21 2 osd.21 up 1 >>>>> -6 22 host s102 >>>>> 33 2 osd.33 up 1 >>>>> 34 2 osd.34 up 1 >>>>> 35 2 osd.35 up 1 >>>>> 36 2 osd.36 up 1 >>>>> 37 2 osd.37 up 1 >>>>> 38 2 osd.38 up 1 >>>>> 39 2 osd.39 up 1 >>>>> 40 2 osd.40 up 1 >>>>> 41 2 osd.41 up 1 >>>>> 42 2 osd.42 up 1 >>>>> 43 2 osd.43 up 1 >>>>> -13 21 rack r10 >>>>> -12 21 host s103 >>>>> 55 2 osd.55 up 1 >>>>> 56 2 osd.56 up 1 >>>>> 57 2 osd.57 up 1 >>>>> 58 2 osd.58 up 1 >>>>> 59 2 osd.59 down 0 >>>>> 60 2 osd.60 down 0 >>>>> 61 2 osd.61 down 0 >>>>> 62 2 osd.62 up 1 >>>>> 63 2 osd.63 up 1 >>>>> 64 1.5 osd.64 up 1 >>>>> 65 1.5 osd.65 down 0 >>>>> -10 65 room p2 >>>>> -7 22 rack r20 >>>>> -5 22 host s202 >>>>> 22 2 osd.22 up 1 >>>>> 23 2 osd.23 up 1 >>>>> 24 2 osd.24 up 1 >>>>> 25 2 osd.25 up 1 >>>>> 26 2 osd.26 up 1 >>>>> 27 2 osd.27 up 1 >>>>> 28 2 osd.28 up 1 >>>>> 29 2 osd.29 up 1 >>>>> 30 2 osd.30 up 1 >>>>> 31 2 osd.31 up 1 >>>>> 32 2 osd.32 up 1 >>>>> -8 22 rack r22 >>>>> -2 22 host s201 >>>>> 0 2 osd.0 up 1 >>>>> 1 2 osd.1 up 1 >>>>> 2 2 osd.2 up 1 >>>>> 3 2 osd.3 up 1 >>>>> 4 2 osd.4 up 1 >>>>> 5 2 osd.5 up 1 >>>>> 6 2 osd.6 up 1 >>>>> 7 2 osd.7 up 1 >>>>> 8 2 osd.8 up 1 >>>>> 9 2 osd.9 up 1 >>>>> 10 2 osd.10 up 1 >>>>> -14 21 rack r21 >>>>> -11 21 host s203 >>>>> 44 2 osd.44 up 1 >>>>> 45 2 osd.45 up 1 >>>>> 46 2 osd.46 up 1 >>>>> 47 2 osd.47 up 1 >>>>> 48 2 osd.48 up 1 >>>>> 49 2 osd.49 up 1 >>>>> 50 2 osd.50 up 1 >>>>> 51 2 osd.51 up 1 >>>>> 52 1.5 osd.52 up 1 >>>>> 53 1.5 osd.53 up 1 >>>>> 54 2 osd.54 up 1 >>>>> >>>>> >>>>> 2013/4/21 Marco Aroldi <marco.aroldi@xxxxxxxxx>: >>>>>> So, I've restarted the new osds as many as possible and the cluster >>>>>> started to move data to the 2 new nodes overnight. >>>>>> This morning there was not netowrk traffic and the healt was >>>>>> >>>>>> HEALTH_ERR 1323 pgs backfill; 150 pgs backfill_toofull; 100 pgs >>>>>> backfilling; 114 pgs degraded; 3374 pgs peering; 36 pgs recovering; >>>>>> 949 pgs recovery_wait; 3374 pgs stuck inactive; 6289 pgs stuck >>>>>> unclean; recovery 2130652/20890113 degraded (10.199%); 58/8914654 >>>>>> unfound (0.001%); 1 full osd(s); 22 near full osd(s); full,noup,nodown >>>>>> flag(s) set >>>>>> >>>>>> So I have unset the noup and nodown flags and the data started movin again >>>>>> I've increased the full ratio to 97% so now there's no "official" full >>>>>> osd and the HEALTH_ERR became HEALT_WARN >>>>>> >>>>>> However, still no access to filesystem >>>>>> >>>>>> HEALTH_WARN 1906 pgs backfill; 21 pgs backfill_toofull; 52 pgs >>>>>> backfilling; 707 pgs degraded; 371 pgs down; 97 pgs incomplete; 3385 >>>>>> pgs peering; 35 pgs recovering; 1002 pgs recovery_wait; 4 pgs stale; >>>>>> 683 pgs stuck inactive; 5898 pgs stuck unclean; recovery >>>>>> 3081499/22208859 degraded (13.875%); 487/9433642 unfound (0.005%); >>>>>> recovering 11722 o/s, 57040MB/s; 17 near full osd(s) >>>>>> >>>>>> The osd are flapping in/out again... >>>>>> >>>>>> I'm disposed to start deleting some portion of data. >>>>>> What can I try to do now? >>>>>> >>>>>> 2013/4/21 Gregory Farnum <greg@xxxxxxxxxxx>: >>>>>>> It's not entirely clear from your description and the output you've >>>>>>> given us, but it looks like maybe you've managed to bring up all your >>>>>>> OSDs correctly at this point? Or are they just not reporting down >>>>>>> because you set the "no down" flag... >>>>>>> >>>>>>> In any case, CephFS isn't going to come up while the underlying RADOS >>>>>>> cluster is this unhealthy, so you're going to need to get that going >>>>>>> again. Since your OSDs have managed to get themselves so full it's >>>>>>> going to be trickier than normal, but if all the rebalancing that's >>>>>>> happening is only because you sort-of-didn't-really lose nodes, and >>>>>>> you can bring them all back up, you should be able to sort it out by >>>>>>> getting all the nodes back up, and then changing your full percentages >>>>>>> (by a *very small* amount); since you haven't been doing any writes to >>>>>>> the cluster it shouldn't take much data writes to get everything back >>>>>>> where it was, although if this has been continuing to backfill in the >>>>>>> meanwhile that will need to unwind. >>>>>>> -Greg >>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>>>> >>>>>>> >>>>>>> On Sat, Apr 20, 2013 at 12:21 PM, John Wilkins <john.wilkins@xxxxxxxxxxx> wrote: >>>>>>>> I don't see anything related to lost objects in your output. I just see >>>>>>>> waiting on backfill, backfill_toofull, remapped, and so forth. You can read >>>>>>>> a bit about what is going on here: >>>>>>>> http://ceph.com/docs/next/rados/operations/monitoring-osd-pg/ >>>>>>>> >>>>>>>> Keep us posted as to the recovery, and let me know what I can do to improve >>>>>>>> the docs for scenarios like this. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Apr 20, 2013 at 10:52 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> John, >>>>>>>>> thanks for the quick reply. >>>>>>>>> Below you can see my ceph osd tree >>>>>>>>> The problem is caused not by the failure itself, but by the "renamed" >>>>>>>>> bunch of devices. >>>>>>>>> It was like a deadly 15-puzzle >>>>>>>>> I think that the solution was to mount the devices in fstab using UUID >>>>>>>>> (/dev/disk/by-uuid) instead of /dev/sdX >>>>>>>>> >>>>>>>>> However, yes I have an entry in my ceph.conf (devs = /dev/sdX1 -- >>>>>>>>> osd_journal = /dev/sdX2) *and* an entry in my fstab for each OSD >>>>>>>>> >>>>>>>>> The node with failed disk is s103 (osd.59) >>>>>>>>> >>>>>>>>> Now i have 5 osd from s203 up and in to try to let ceph rebalance >>>>>>>>> data... but is still a bloody mess. >>>>>>>>> Look at ceph -w output: is reported a total of 110TB: is wrong... al >>>>>>>>> drives are 2TB and i have 49 drives up and in -- total 98Tb >>>>>>>>> I think that 110TB (55 osd) was the size before cluster became >>>>>>>>> inaccessible >>>>>>>>> >>>>>>>>> # id weight type name up/down reweight >>>>>>>>> -1 130 root default >>>>>>>>> -9 65 room p1 >>>>>>>>> -3 44 rack r14 >>>>>>>>> -4 22 host s101 >>>>>>>>> 11 2 osd.11 up 1 >>>>>>>>> 12 2 osd.12 up 1 >>>>>>>>> 13 2 osd.13 up 1 >>>>>>>>> 14 2 osd.14 up 1 >>>>>>>>> 15 2 osd.15 up 1 >>>>>>>>> 16 2 osd.16 up 1 >>>>>>>>> 17 2 osd.17 up 1 >>>>>>>>> 18 2 osd.18 up 1 >>>>>>>>> 19 2 osd.19 up 1 >>>>>>>>> 20 2 osd.20 up 1 >>>>>>>>> 21 2 osd.21 up 1 >>>>>>>>> -6 22 host s102 >>>>>>>>> 33 2 osd.33 up 1 >>>>>>>>> 34 2 osd.34 up 1 >>>>>>>>> 35 2 osd.35 up 1 >>>>>>>>> 36 2 osd.36 up 1 >>>>>>>>> 37 2 osd.37 up 1 >>>>>>>>> 38 2 osd.38 up 1 >>>>>>>>> 39 2 osd.39 up 1 >>>>>>>>> 40 2 osd.40 up 1 >>>>>>>>> 41 2 osd.41 up 1 >>>>>>>>> 42 2 osd.42 up 1 >>>>>>>>> 43 2 osd.43 up 1 >>>>>>>>> -13 21 rack r10 >>>>>>>>> -12 21 host s103 >>>>>>>>> 55 2 osd.55 up 0 >>>>>>>>> 56 2 osd.56 up 0 >>>>>>>>> 57 2 osd.57 up 0 >>>>>>>>> 58 2 osd.58 up 0 >>>>>>>>> 59 2 osd.59 down 0 >>>>>>>>> 60 2 osd.60 down 0 >>>>>>>>> 61 2 osd.61 down 0 >>>>>>>>> 62 2 osd.62 up 0 >>>>>>>>> 63 2 osd.63 up 0 >>>>>>>>> 64 1.5 osd.64 up 0 >>>>>>>>> 65 1.5 osd.65 down 0 >>>>>>>>> -10 65 room p2 >>>>>>>>> -7 22 rack r20 >>>>>>>>> -5 22 host s202 >>>>>>>>> 22 2 osd.22 up 1 >>>>>>>>> 23 2 osd.23 up 1 >>>>>>>>> 24 2 osd.24 up 1 >>>>>>>>> 25 2 osd.25 up 1 >>>>>>>>> 26 2 osd.26 up 1 >>>>>>>>> 27 2 osd.27 up 1 >>>>>>>>> 28 2 osd.28 up 1 >>>>>>>>> 29 2 osd.29 up 1 >>>>>>>>> 30 2 osd.30 up 1 >>>>>>>>> 31 2 osd.31 up 1 >>>>>>>>> 32 2 osd.32 up 1 >>>>>>>>> -8 22 rack r22 >>>>>>>>> -2 22 host s201 >>>>>>>>> 0 2 osd.0 up 1 >>>>>>>>> 1 2 osd.1 up 1 >>>>>>>>> 2 2 osd.2 up 1 >>>>>>>>> 3 2 osd.3 up 1 >>>>>>>>> 4 2 osd.4 up 1 >>>>>>>>> 5 2 osd.5 up 1 >>>>>>>>> 6 2 osd.6 up 1 >>>>>>>>> 7 2 osd.7 up 1 >>>>>>>>> 8 2 osd.8 up 1 >>>>>>>>> 9 2 osd.9 up 1 >>>>>>>>> 10 2 osd.10 up 1 >>>>>>>>> -14 21 rack r21 >>>>>>>>> -11 21 host s203 >>>>>>>>> 44 2 osd.44 up 1 >>>>>>>>> 45 2 osd.45 up 1 >>>>>>>>> 46 2 osd.46 up 1 >>>>>>>>> 47 2 osd.47 up 1 >>>>>>>>> 48 2 osd.48 up 1 >>>>>>>>> 49 2 osd.49 up 0 >>>>>>>>> 50 2 osd.50 up 0 >>>>>>>>> 51 2 osd.51 up 0 >>>>>>>>> 52 1.5 osd.52 up 0 >>>>>>>>> 53 1.5 osd.53 up 0 >>>>>>>>> 54 2 osd.54 up 0 >>>>>>>>> >>>>>>>>> >>>>>>>>> ceph -w >>>>>>>>> >>>>>>>>> 2013-04-20 19:46:48.608988 mon.0 [INF] pgmap v1352767: 17280 pgs: 58 >>>>>>>>> active, 12581 active+clean, 1686 active+remapped+wait_backfill, 24 >>>>>>>>> active+degraded+wait_backfill, 224 >>>>>>>>> active+remapped+wait_backfill+backfill_toofull, 1061 >>>>>>>>> active+recovery_wait, 4 >>>>>>>>> active+degraded+wait_backfill+backfill_toofull, 629 peering, 626 >>>>>>>>> active+remapped, 72 active+remapped+backfilling, 89 active+degraded, >>>>>>>>> 14 active+remapped+backfill_toofull, 1 active+clean+scrubbing, 8 >>>>>>>>> active+degraded+remapped+wait_backfill, 20 >>>>>>>>> active+recovery_wait+remapped, 5 >>>>>>>>> active+degraded+remapped+wait_backfill+backfill_toofull, 162 >>>>>>>>> remapped+peering, 1 active+degraded+remapped+backfilling, 2 >>>>>>>>> active+degraded+remapped+backfill_toofull, 13 active+recovering; 49777 >>>>>>>>> GB data, 72863 GB used, 40568 GB / 110 TB avail; 2965687/21848501 >>>>>>>>> degraded (13.574%); recovering 5 o/s, 16363B/s >>>>>>>>> >>>>>>>>> 2013/4/20 John Wilkins <john.wilkins@xxxxxxxxxxx>: >>>>>>>>> > Marco, >>>>>>>>> > >>>>>>>>> > If you do a "ceph tree" can you see if your OSDs are all up? You seem to >>>>>>>>> > have at least one problem related to the backfill OSDs being too full, >>>>>>>>> > and >>>>>>>>> > some which are near full or full for the purposes of storage. See the >>>>>>>>> > following in the documentation to see if this helps: >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > http://ceph.com/docs/master/rados/configuration/mon-config-ref/#storage-capacity >>>>>>>>> > >>>>>>>>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling >>>>>>>>> > >>>>>>>>> > http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#no-free-drive-space >>>>>>>>> > >>>>>>>>> > Before you start deleting data as a remedy, you'd want to at least try >>>>>>>>> > to >>>>>>>>> > get the OSDs back up and running first. >>>>>>>>> > >>>>>>>>> > If rebooting changed the drive names, you might look here: >>>>>>>>> > >>>>>>>>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#general-settings >>>>>>>>> > >>>>>>>>> > We have default settings for OSD and journal paths, which you could >>>>>>>>> > override >>>>>>>>> > if you can locate the data and journal sources on the renamed drives. If >>>>>>>>> > you >>>>>>>>> > mounted them, but didn't add them to the fstab, that might be the source >>>>>>>>> > of >>>>>>>>> > the problem. I'd rather see you use the default paths, as it would be >>>>>>>>> > easier >>>>>>>>> > to troubleshoot later. So did you mount the drives, but not add the >>>>>>>>> > mount >>>>>>>>> > points to fstab? >>>>>>>>> > >>>>>>>>> > John >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On Sat, Apr 20, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> >>>>>>>>> > wrote: >>>>>>>>> >> >>>>>>>>> >> Hi, >>>>>>>>> >> due a harware failure during expanding ceph, I'm in big trouble >>>>>>>>> >> because the cephfs doesn't mount anymore. >>>>>>>>> >> I was adding a couple storage nodes, but a disk has failed and after a >>>>>>>>> >> reboot the OS (ubuntu 12.04) renamed the remaining devices, so the >>>>>>>>> >> entire node has been screwed out. >>>>>>>>> >> >>>>>>>>> >> Now, from the "sane new node", I'm taking some new osd up and in >>>>>>>>> >> because the cluster is near full and I can't revert completely the >>>>>>>>> >> situation as before >>>>>>>>> >> >>>>>>>>> >> *I can* afford data loss, but i need to regain access to the filesystem >>>>>>>>> >> >>>>>>>>> >> My setup: >>>>>>>>> >> 3 mon + 3 mds >>>>>>>>> >> 4 storage nodes (i was adding no. 5 and 6) >>>>>>>>> >> >>>>>>>>> >> Ceph 0.56.4 >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> ceph health: >>>>>>>>> >> HEALTH_ERR 2008 pgs backfill; 246 pgs backfill_toofull; 74 pgs >>>>>>>>> >> backfilling; 134 pgs degraded; 790 pgs peering; 10 pgs recovering; >>>>>>>>> >> 1116 pgs recovery_wait; 790 pgs stuck inactive; 4782 pgs stuck >>>>>>>>> >> unclean; recovery 3049459/21926624 degraded (13.908%); recovering 6 >>>>>>>>> >> o/s, 16316KB/s; 4 full osd(s); 30 near full osd(s); full,noup,nodown >>>>>>>>> >> flag(s) set >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> ceph mds dump: >>>>>>>>> >> dumped mdsmap epoch 44 >>>>>>>>> >> epoch 44 >>>>>>>>> >> flags 0 >>>>>>>>> >> created 2013-03-18 14:42:29.330548 >>>>>>>>> >> modified 2013-04-20 17:14:32.969332 >>>>>>>>> >> tableserver 0 >>>>>>>>> >> root 0 >>>>>>>>> >> session_timeout 60 >>>>>>>>> >> session_autoclose 300 >>>>>>>>> >> last_failure 43 >>>>>>>>> >> last_failure_osd_epoch 18160 >>>>>>>>> >> compat compat={},rocompat={},incompat={1=base v0.20,2=client >>>>>>>>> >> writeable ranges,3=default file layouts on dirs,4=dir inode in >>>>>>>>> >> separate object} >>>>>>>>> >> max_mds 1 >>>>>>>>> >> in 0 >>>>>>>>> >> up {0=6376} >>>>>>>>> >> failed >>>>>>>>> >> stopped >>>>>>>>> >> data_pools [0] >>>>>>>>> >> metadata_pool 1 >>>>>>>>> >> 6376: 192.168.21.11:6800/13457 'm1' mds.0.9 up:replay seq 1 >>>>>>>>> >> 5945: 192.168.21.13:6800/12999 'm3' mds.-1.0 up:standby seq 1 >>>>>>>>> >> 5963: 192.168.21.12:6800/22454 'm2' mds.-1.0 up:standby seq 1 >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> ceph mon dump: >>>>>>>>> >> epoch 1 >>>>>>>>> >> fsid d634f7b3-8a8a-4893-bdfb-a95ccca7fddd >>>>>>>>> >> last_changed 2013-03-18 14:39:42.253923 >>>>>>>>> >> created 2013-03-18 14:39:42.253923 >>>>>>>>> >> 0: 192.168.21.11:6789/0 mon.m1 >>>>>>>>> >> 1: 192.168.21.12:6789/0 mon.m2 >>>>>>>>> >> 2: 192.168.21.13:6789/0 mon.m3 >>>>>>>>> >> _______________________________________________ >>>>>>>>> >> ceph-users mailing list >>>>>>>>> >> ceph-users@xxxxxxxxxxxxxx >>>>>>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> > John Wilkins >>>>>>>>> > Senior Technical Writer >>>>>>>>> > Intank >>>>>>>>> > john.wilkins@xxxxxxxxxxx >>>>>>>>> > (415) 425-9599 >>>>>>>>> > http://inktank.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> John Wilkins >>>>>>>> Senior Technical Writer >>>>>>>> Intank >>>>>>>> john.wilkins@xxxxxxxxxxx >>>>>>>> (415) 425-9599 >>>>>>>> http://inktank.com >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com