What I can try to do/delete to regain access? Those osd are crazy, flapping up and down. I think that the situation is without control HEALTH_WARN 2735 pgs backfill; 13 pgs backfill_toofull; 157 pgs backfilling; 188 pgs degraded; 251 pgs peering; 13 pgs recovering; 1159 pgs recovery_wait; 159 pgs stuck inactive; 4641 pgs stuck unclean; recovery 4007916/23007073 degraded (17.420%); recovering 4 o/s, 31927KB/s; 19 near full osd(s) 2013-04-21 18:56:46.839851 mon.0 [INF] pgmap v1399007: 17280 pgs: 276 active, 12791 active+clean, 2575 active+remapped+wait_backfill, 71 active+degraded+wait_backfill, 6 active+remapped+wait_backfill+backfill_toofull, 1121 active+recovery_wait, 90 peering, 3 remapped, 1 active+remapped, 127 active+remapped+backfilling, 1 active+degraded, 5 active+remapped+backfill_toofull, 19 active+degraded+backfilling, 1 active+clean+scrubbing, 79 active+degraded+remapped+wait_backfill, 36 active+recovery_wait+remapped, 1 active+degraded+remapped+wait_backfill+backfill_toofull, 46 remapped+peering, 16 active+degraded+remapped+backfilling, 1 active+recovery_wait+degraded+remapped, 14 active+recovering; 50435 GB data, 74790 GB used, 38642 GB / 110 TB avail; 4018849/23025448 degraded (17.454%); recovering 14 o/s, 54732KB/s # id weight type name up/down reweight -1 130 root default -9 65 room p1 -3 44 rack r14 -4 22 host s101 11 2 osd.11 up 1 12 2 osd.12 up 1 13 2 osd.13 up 1 14 2 osd.14 up 1 15 2 osd.15 up 1 16 2 osd.16 up 1 17 2 osd.17 up 1 18 2 osd.18 up 1 19 2 osd.19 up 1 20 2 osd.20 up 1 21 2 osd.21 up 1 -6 22 host s102 33 2 osd.33 up 1 34 2 osd.34 up 1 35 2 osd.35 up 1 36 2 osd.36 up 1 37 2 osd.37 up 1 38 2 osd.38 up 1 39 2 osd.39 up 1 40 2 osd.40 up 1 41 2 osd.41 up 1 42 2 osd.42 up 1 43 2 osd.43 up 1 -13 21 rack r10 -12 21 host s103 55 2 osd.55 up 1 56 2 osd.56 up 1 57 2 osd.57 up 1 58 2 osd.58 up 1 59 2 osd.59 down 0 60 2 osd.60 down 0 61 2 osd.61 down 0 62 2 osd.62 up 1 63 2 osd.63 up 1 64 1.5 osd.64 up 1 65 1.5 osd.65 down 0 -10 65 room p2 -7 22 rack r20 -5 22 host s202 22 2 osd.22 up 1 23 2 osd.23 up 1 24 2 osd.24 up 1 25 2 osd.25 up 1 26 2 osd.26 up 1 27 2 osd.27 up 1 28 2 osd.28 up 1 29 2 osd.29 up 1 30 2 osd.30 up 1 31 2 osd.31 up 1 32 2 osd.32 up 1 -8 22 rack r22 -2 22 host s201 0 2 osd.0 up 1 1 2 osd.1 up 1 2 2 osd.2 up 1 3 2 osd.3 up 1 4 2 osd.4 up 1 5 2 osd.5 up 1 6 2 osd.6 up 1 7 2 osd.7 up 1 8 2 osd.8 up 1 9 2 osd.9 up 1 10 2 osd.10 up 1 -14 21 rack r21 -11 21 host s203 44 2 osd.44 up 1 45 2 osd.45 up 1 46 2 osd.46 up 1 47 2 osd.47 up 1 48 2 osd.48 up 1 49 2 osd.49 up 1 50 2 osd.50 up 1 51 2 osd.51 up 1 52 1.5 osd.52 up 1 53 1.5 osd.53 up 1 54 2 osd.54 up 1 2013/4/21 Marco Aroldi <marco.aroldi@xxxxxxxxx>: > So, I've restarted the new osds as many as possible and the cluster > started to move data to the 2 new nodes overnight. > This morning there was not netowrk traffic and the healt was > > HEALTH_ERR 1323 pgs backfill; 150 pgs backfill_toofull; 100 pgs > backfilling; 114 pgs degraded; 3374 pgs peering; 36 pgs recovering; > 949 pgs recovery_wait; 3374 pgs stuck inactive; 6289 pgs stuck > unclean; recovery 2130652/20890113 degraded (10.199%); 58/8914654 > unfound (0.001%); 1 full osd(s); 22 near full osd(s); full,noup,nodown > flag(s) set > > So I have unset the noup and nodown flags and the data started movin again > I've increased the full ratio to 97% so now there's no "official" full > osd and the HEALTH_ERR became HEALT_WARN > > However, still no access to filesystem > > HEALTH_WARN 1906 pgs backfill; 21 pgs backfill_toofull; 52 pgs > backfilling; 707 pgs degraded; 371 pgs down; 97 pgs incomplete; 3385 > pgs peering; 35 pgs recovering; 1002 pgs recovery_wait; 4 pgs stale; > 683 pgs stuck inactive; 5898 pgs stuck unclean; recovery > 3081499/22208859 degraded (13.875%); 487/9433642 unfound (0.005%); > recovering 11722 o/s, 57040MB/s; 17 near full osd(s) > > The osd are flapping in/out again... > > I'm disposed to start deleting some portion of data. > What can I try to do now? > > 2013/4/21 Gregory Farnum <greg@xxxxxxxxxxx>: >> It's not entirely clear from your description and the output you've >> given us, but it looks like maybe you've managed to bring up all your >> OSDs correctly at this point? Or are they just not reporting down >> because you set the "no down" flag... >> >> In any case, CephFS isn't going to come up while the underlying RADOS >> cluster is this unhealthy, so you're going to need to get that going >> again. Since your OSDs have managed to get themselves so full it's >> going to be trickier than normal, but if all the rebalancing that's >> happening is only because you sort-of-didn't-really lose nodes, and >> you can bring them all back up, you should be able to sort it out by >> getting all the nodes back up, and then changing your full percentages >> (by a *very small* amount); since you haven't been doing any writes to >> the cluster it shouldn't take much data writes to get everything back >> where it was, although if this has been continuing to backfill in the >> meanwhile that will need to unwind. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Sat, Apr 20, 2013 at 12:21 PM, John Wilkins <john.wilkins@xxxxxxxxxxx> wrote: >>> I don't see anything related to lost objects in your output. I just see >>> waiting on backfill, backfill_toofull, remapped, and so forth. You can read >>> a bit about what is going on here: >>> http://ceph.com/docs/next/rados/operations/monitoring-osd-pg/ >>> >>> Keep us posted as to the recovery, and let me know what I can do to improve >>> the docs for scenarios like this. >>> >>> >>> On Sat, Apr 20, 2013 at 10:52 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> >>> wrote: >>>> >>>> John, >>>> thanks for the quick reply. >>>> Below you can see my ceph osd tree >>>> The problem is caused not by the failure itself, but by the "renamed" >>>> bunch of devices. >>>> It was like a deadly 15-puzzle >>>> I think that the solution was to mount the devices in fstab using UUID >>>> (/dev/disk/by-uuid) instead of /dev/sdX >>>> >>>> However, yes I have an entry in my ceph.conf (devs = /dev/sdX1 -- >>>> osd_journal = /dev/sdX2) *and* an entry in my fstab for each OSD >>>> >>>> The node with failed disk is s103 (osd.59) >>>> >>>> Now i have 5 osd from s203 up and in to try to let ceph rebalance >>>> data... but is still a bloody mess. >>>> Look at ceph -w output: is reported a total of 110TB: is wrong... al >>>> drives are 2TB and i have 49 drives up and in -- total 98Tb >>>> I think that 110TB (55 osd) was the size before cluster became >>>> inaccessible >>>> >>>> # id weight type name up/down reweight >>>> -1 130 root default >>>> -9 65 room p1 >>>> -3 44 rack r14 >>>> -4 22 host s101 >>>> 11 2 osd.11 up 1 >>>> 12 2 osd.12 up 1 >>>> 13 2 osd.13 up 1 >>>> 14 2 osd.14 up 1 >>>> 15 2 osd.15 up 1 >>>> 16 2 osd.16 up 1 >>>> 17 2 osd.17 up 1 >>>> 18 2 osd.18 up 1 >>>> 19 2 osd.19 up 1 >>>> 20 2 osd.20 up 1 >>>> 21 2 osd.21 up 1 >>>> -6 22 host s102 >>>> 33 2 osd.33 up 1 >>>> 34 2 osd.34 up 1 >>>> 35 2 osd.35 up 1 >>>> 36 2 osd.36 up 1 >>>> 37 2 osd.37 up 1 >>>> 38 2 osd.38 up 1 >>>> 39 2 osd.39 up 1 >>>> 40 2 osd.40 up 1 >>>> 41 2 osd.41 up 1 >>>> 42 2 osd.42 up 1 >>>> 43 2 osd.43 up 1 >>>> -13 21 rack r10 >>>> -12 21 host s103 >>>> 55 2 osd.55 up 0 >>>> 56 2 osd.56 up 0 >>>> 57 2 osd.57 up 0 >>>> 58 2 osd.58 up 0 >>>> 59 2 osd.59 down 0 >>>> 60 2 osd.60 down 0 >>>> 61 2 osd.61 down 0 >>>> 62 2 osd.62 up 0 >>>> 63 2 osd.63 up 0 >>>> 64 1.5 osd.64 up 0 >>>> 65 1.5 osd.65 down 0 >>>> -10 65 room p2 >>>> -7 22 rack r20 >>>> -5 22 host s202 >>>> 22 2 osd.22 up 1 >>>> 23 2 osd.23 up 1 >>>> 24 2 osd.24 up 1 >>>> 25 2 osd.25 up 1 >>>> 26 2 osd.26 up 1 >>>> 27 2 osd.27 up 1 >>>> 28 2 osd.28 up 1 >>>> 29 2 osd.29 up 1 >>>> 30 2 osd.30 up 1 >>>> 31 2 osd.31 up 1 >>>> 32 2 osd.32 up 1 >>>> -8 22 rack r22 >>>> -2 22 host s201 >>>> 0 2 osd.0 up 1 >>>> 1 2 osd.1 up 1 >>>> 2 2 osd.2 up 1 >>>> 3 2 osd.3 up 1 >>>> 4 2 osd.4 up 1 >>>> 5 2 osd.5 up 1 >>>> 6 2 osd.6 up 1 >>>> 7 2 osd.7 up 1 >>>> 8 2 osd.8 up 1 >>>> 9 2 osd.9 up 1 >>>> 10 2 osd.10 up 1 >>>> -14 21 rack r21 >>>> -11 21 host s203 >>>> 44 2 osd.44 up 1 >>>> 45 2 osd.45 up 1 >>>> 46 2 osd.46 up 1 >>>> 47 2 osd.47 up 1 >>>> 48 2 osd.48 up 1 >>>> 49 2 osd.49 up 0 >>>> 50 2 osd.50 up 0 >>>> 51 2 osd.51 up 0 >>>> 52 1.5 osd.52 up 0 >>>> 53 1.5 osd.53 up 0 >>>> 54 2 osd.54 up 0 >>>> >>>> >>>> ceph -w >>>> >>>> 2013-04-20 19:46:48.608988 mon.0 [INF] pgmap v1352767: 17280 pgs: 58 >>>> active, 12581 active+clean, 1686 active+remapped+wait_backfill, 24 >>>> active+degraded+wait_backfill, 224 >>>> active+remapped+wait_backfill+backfill_toofull, 1061 >>>> active+recovery_wait, 4 >>>> active+degraded+wait_backfill+backfill_toofull, 629 peering, 626 >>>> active+remapped, 72 active+remapped+backfilling, 89 active+degraded, >>>> 14 active+remapped+backfill_toofull, 1 active+clean+scrubbing, 8 >>>> active+degraded+remapped+wait_backfill, 20 >>>> active+recovery_wait+remapped, 5 >>>> active+degraded+remapped+wait_backfill+backfill_toofull, 162 >>>> remapped+peering, 1 active+degraded+remapped+backfilling, 2 >>>> active+degraded+remapped+backfill_toofull, 13 active+recovering; 49777 >>>> GB data, 72863 GB used, 40568 GB / 110 TB avail; 2965687/21848501 >>>> degraded (13.574%); recovering 5 o/s, 16363B/s >>>> >>>> 2013/4/20 John Wilkins <john.wilkins@xxxxxxxxxxx>: >>>> > Marco, >>>> > >>>> > If you do a "ceph tree" can you see if your OSDs are all up? You seem to >>>> > have at least one problem related to the backfill OSDs being too full, >>>> > and >>>> > some which are near full or full for the purposes of storage. See the >>>> > following in the documentation to see if this helps: >>>> > >>>> > >>>> > http://ceph.com/docs/master/rados/configuration/mon-config-ref/#storage-capacity >>>> > >>>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling >>>> > >>>> > http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#no-free-drive-space >>>> > >>>> > Before you start deleting data as a remedy, you'd want to at least try >>>> > to >>>> > get the OSDs back up and running first. >>>> > >>>> > If rebooting changed the drive names, you might look here: >>>> > >>>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#general-settings >>>> > >>>> > We have default settings for OSD and journal paths, which you could >>>> > override >>>> > if you can locate the data and journal sources on the renamed drives. If >>>> > you >>>> > mounted them, but didn't add them to the fstab, that might be the source >>>> > of >>>> > the problem. I'd rather see you use the default paths, as it would be >>>> > easier >>>> > to troubleshoot later. So did you mount the drives, but not add the >>>> > mount >>>> > points to fstab? >>>> > >>>> > John >>>> > >>>> > >>>> > >>>> > >>>> > On Sat, Apr 20, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> >>>> > wrote: >>>> >> >>>> >> Hi, >>>> >> due a harware failure during expanding ceph, I'm in big trouble >>>> >> because the cephfs doesn't mount anymore. >>>> >> I was adding a couple storage nodes, but a disk has failed and after a >>>> >> reboot the OS (ubuntu 12.04) renamed the remaining devices, so the >>>> >> entire node has been screwed out. >>>> >> >>>> >> Now, from the "sane new node", I'm taking some new osd up and in >>>> >> because the cluster is near full and I can't revert completely the >>>> >> situation as before >>>> >> >>>> >> *I can* afford data loss, but i need to regain access to the filesystem >>>> >> >>>> >> My setup: >>>> >> 3 mon + 3 mds >>>> >> 4 storage nodes (i was adding no. 5 and 6) >>>> >> >>>> >> Ceph 0.56.4 >>>> >> >>>> >> >>>> >> ceph health: >>>> >> HEALTH_ERR 2008 pgs backfill; 246 pgs backfill_toofull; 74 pgs >>>> >> backfilling; 134 pgs degraded; 790 pgs peering; 10 pgs recovering; >>>> >> 1116 pgs recovery_wait; 790 pgs stuck inactive; 4782 pgs stuck >>>> >> unclean; recovery 3049459/21926624 degraded (13.908%); recovering 6 >>>> >> o/s, 16316KB/s; 4 full osd(s); 30 near full osd(s); full,noup,nodown >>>> >> flag(s) set >>>> >> >>>> >> >>>> >> >>>> >> ceph mds dump: >>>> >> dumped mdsmap epoch 44 >>>> >> epoch 44 >>>> >> flags 0 >>>> >> created 2013-03-18 14:42:29.330548 >>>> >> modified 2013-04-20 17:14:32.969332 >>>> >> tableserver 0 >>>> >> root 0 >>>> >> session_timeout 60 >>>> >> session_autoclose 300 >>>> >> last_failure 43 >>>> >> last_failure_osd_epoch 18160 >>>> >> compat compat={},rocompat={},incompat={1=base v0.20,2=client >>>> >> writeable ranges,3=default file layouts on dirs,4=dir inode in >>>> >> separate object} >>>> >> max_mds 1 >>>> >> in 0 >>>> >> up {0=6376} >>>> >> failed >>>> >> stopped >>>> >> data_pools [0] >>>> >> metadata_pool 1 >>>> >> 6376: 192.168.21.11:6800/13457 'm1' mds.0.9 up:replay seq 1 >>>> >> 5945: 192.168.21.13:6800/12999 'm3' mds.-1.0 up:standby seq 1 >>>> >> 5963: 192.168.21.12:6800/22454 'm2' mds.-1.0 up:standby seq 1 >>>> >> >>>> >> >>>> >> >>>> >> ceph mon dump: >>>> >> epoch 1 >>>> >> fsid d634f7b3-8a8a-4893-bdfb-a95ccca7fddd >>>> >> last_changed 2013-03-18 14:39:42.253923 >>>> >> created 2013-03-18 14:39:42.253923 >>>> >> 0: 192.168.21.11:6789/0 mon.m1 >>>> >> 1: 192.168.21.12:6789/0 mon.m2 >>>> >> 2: 192.168.21.13:6789/0 mon.m3 >>>> >> _______________________________________________ >>>> >> ceph-users mailing list >>>> >> ceph-users@xxxxxxxxxxxxxx >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > John Wilkins >>>> > Senior Technical Writer >>>> > Intank >>>> > john.wilkins@xxxxxxxxxxx >>>> > (415) 425-9599 >>>> > http://inktank.com >>> >>> >>> >>> >>> -- >>> John Wilkins >>> Senior Technical Writer >>> Intank >>> john.wilkins@xxxxxxxxxxx >>> (415) 425-9599 >>> http://inktank.com >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com