So, I've restarted the new osds as many as possible and the cluster started to move data to the 2 new nodes overnight. This morning there was not netowrk traffic and the healt was HEALTH_ERR 1323 pgs backfill; 150 pgs backfill_toofull; 100 pgs backfilling; 114 pgs degraded; 3374 pgs peering; 36 pgs recovering; 949 pgs recovery_wait; 3374 pgs stuck inactive; 6289 pgs stuck unclean; recovery 2130652/20890113 degraded (10.199%); 58/8914654 unfound (0.001%); 1 full osd(s); 22 near full osd(s); full,noup,nodown flag(s) set So I have unset the noup and nodown flags and the data started movin again I've increased the full ratio to 97% so now there's no "official" full osd and the HEALTH_ERR became HEALT_WARN However, still no access to filesystem HEALTH_WARN 1906 pgs backfill; 21 pgs backfill_toofull; 52 pgs backfilling; 707 pgs degraded; 371 pgs down; 97 pgs incomplete; 3385 pgs peering; 35 pgs recovering; 1002 pgs recovery_wait; 4 pgs stale; 683 pgs stuck inactive; 5898 pgs stuck unclean; recovery 3081499/22208859 degraded (13.875%); 487/9433642 unfound (0.005%); recovering 11722 o/s, 57040MB/s; 17 near full osd(s) The osd are flapping in/out again... I'm disposed to start deleting some portion of data. What can I try to do now? 2013/4/21 Gregory Farnum <greg@xxxxxxxxxxx>: > It's not entirely clear from your description and the output you've > given us, but it looks like maybe you've managed to bring up all your > OSDs correctly at this point? Or are they just not reporting down > because you set the "no down" flag... > > In any case, CephFS isn't going to come up while the underlying RADOS > cluster is this unhealthy, so you're going to need to get that going > again. Since your OSDs have managed to get themselves so full it's > going to be trickier than normal, but if all the rebalancing that's > happening is only because you sort-of-didn't-really lose nodes, and > you can bring them all back up, you should be able to sort it out by > getting all the nodes back up, and then changing your full percentages > (by a *very small* amount); since you haven't been doing any writes to > the cluster it shouldn't take much data writes to get everything back > where it was, although if this has been continuing to backfill in the > meanwhile that will need to unwind. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Sat, Apr 20, 2013 at 12:21 PM, John Wilkins <john.wilkins@xxxxxxxxxxx> wrote: >> I don't see anything related to lost objects in your output. I just see >> waiting on backfill, backfill_toofull, remapped, and so forth. You can read >> a bit about what is going on here: >> http://ceph.com/docs/next/rados/operations/monitoring-osd-pg/ >> >> Keep us posted as to the recovery, and let me know what I can do to improve >> the docs for scenarios like this. >> >> >> On Sat, Apr 20, 2013 at 10:52 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> >> wrote: >>> >>> John, >>> thanks for the quick reply. >>> Below you can see my ceph osd tree >>> The problem is caused not by the failure itself, but by the "renamed" >>> bunch of devices. >>> It was like a deadly 15-puzzle >>> I think that the solution was to mount the devices in fstab using UUID >>> (/dev/disk/by-uuid) instead of /dev/sdX >>> >>> However, yes I have an entry in my ceph.conf (devs = /dev/sdX1 -- >>> osd_journal = /dev/sdX2) *and* an entry in my fstab for each OSD >>> >>> The node with failed disk is s103 (osd.59) >>> >>> Now i have 5 osd from s203 up and in to try to let ceph rebalance >>> data... but is still a bloody mess. >>> Look at ceph -w output: is reported a total of 110TB: is wrong... al >>> drives are 2TB and i have 49 drives up and in -- total 98Tb >>> I think that 110TB (55 osd) was the size before cluster became >>> inaccessible >>> >>> # id weight type name up/down reweight >>> -1 130 root default >>> -9 65 room p1 >>> -3 44 rack r14 >>> -4 22 host s101 >>> 11 2 osd.11 up 1 >>> 12 2 osd.12 up 1 >>> 13 2 osd.13 up 1 >>> 14 2 osd.14 up 1 >>> 15 2 osd.15 up 1 >>> 16 2 osd.16 up 1 >>> 17 2 osd.17 up 1 >>> 18 2 osd.18 up 1 >>> 19 2 osd.19 up 1 >>> 20 2 osd.20 up 1 >>> 21 2 osd.21 up 1 >>> -6 22 host s102 >>> 33 2 osd.33 up 1 >>> 34 2 osd.34 up 1 >>> 35 2 osd.35 up 1 >>> 36 2 osd.36 up 1 >>> 37 2 osd.37 up 1 >>> 38 2 osd.38 up 1 >>> 39 2 osd.39 up 1 >>> 40 2 osd.40 up 1 >>> 41 2 osd.41 up 1 >>> 42 2 osd.42 up 1 >>> 43 2 osd.43 up 1 >>> -13 21 rack r10 >>> -12 21 host s103 >>> 55 2 osd.55 up 0 >>> 56 2 osd.56 up 0 >>> 57 2 osd.57 up 0 >>> 58 2 osd.58 up 0 >>> 59 2 osd.59 down 0 >>> 60 2 osd.60 down 0 >>> 61 2 osd.61 down 0 >>> 62 2 osd.62 up 0 >>> 63 2 osd.63 up 0 >>> 64 1.5 osd.64 up 0 >>> 65 1.5 osd.65 down 0 >>> -10 65 room p2 >>> -7 22 rack r20 >>> -5 22 host s202 >>> 22 2 osd.22 up 1 >>> 23 2 osd.23 up 1 >>> 24 2 osd.24 up 1 >>> 25 2 osd.25 up 1 >>> 26 2 osd.26 up 1 >>> 27 2 osd.27 up 1 >>> 28 2 osd.28 up 1 >>> 29 2 osd.29 up 1 >>> 30 2 osd.30 up 1 >>> 31 2 osd.31 up 1 >>> 32 2 osd.32 up 1 >>> -8 22 rack r22 >>> -2 22 host s201 >>> 0 2 osd.0 up 1 >>> 1 2 osd.1 up 1 >>> 2 2 osd.2 up 1 >>> 3 2 osd.3 up 1 >>> 4 2 osd.4 up 1 >>> 5 2 osd.5 up 1 >>> 6 2 osd.6 up 1 >>> 7 2 osd.7 up 1 >>> 8 2 osd.8 up 1 >>> 9 2 osd.9 up 1 >>> 10 2 osd.10 up 1 >>> -14 21 rack r21 >>> -11 21 host s203 >>> 44 2 osd.44 up 1 >>> 45 2 osd.45 up 1 >>> 46 2 osd.46 up 1 >>> 47 2 osd.47 up 1 >>> 48 2 osd.48 up 1 >>> 49 2 osd.49 up 0 >>> 50 2 osd.50 up 0 >>> 51 2 osd.51 up 0 >>> 52 1.5 osd.52 up 0 >>> 53 1.5 osd.53 up 0 >>> 54 2 osd.54 up 0 >>> >>> >>> ceph -w >>> >>> 2013-04-20 19:46:48.608988 mon.0 [INF] pgmap v1352767: 17280 pgs: 58 >>> active, 12581 active+clean, 1686 active+remapped+wait_backfill, 24 >>> active+degraded+wait_backfill, 224 >>> active+remapped+wait_backfill+backfill_toofull, 1061 >>> active+recovery_wait, 4 >>> active+degraded+wait_backfill+backfill_toofull, 629 peering, 626 >>> active+remapped, 72 active+remapped+backfilling, 89 active+degraded, >>> 14 active+remapped+backfill_toofull, 1 active+clean+scrubbing, 8 >>> active+degraded+remapped+wait_backfill, 20 >>> active+recovery_wait+remapped, 5 >>> active+degraded+remapped+wait_backfill+backfill_toofull, 162 >>> remapped+peering, 1 active+degraded+remapped+backfilling, 2 >>> active+degraded+remapped+backfill_toofull, 13 active+recovering; 49777 >>> GB data, 72863 GB used, 40568 GB / 110 TB avail; 2965687/21848501 >>> degraded (13.574%); recovering 5 o/s, 16363B/s >>> >>> 2013/4/20 John Wilkins <john.wilkins@xxxxxxxxxxx>: >>> > Marco, >>> > >>> > If you do a "ceph tree" can you see if your OSDs are all up? You seem to >>> > have at least one problem related to the backfill OSDs being too full, >>> > and >>> > some which are near full or full for the purposes of storage. See the >>> > following in the documentation to see if this helps: >>> > >>> > >>> > http://ceph.com/docs/master/rados/configuration/mon-config-ref/#storage-capacity >>> > >>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling >>> > >>> > http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#no-free-drive-space >>> > >>> > Before you start deleting data as a remedy, you'd want to at least try >>> > to >>> > get the OSDs back up and running first. >>> > >>> > If rebooting changed the drive names, you might look here: >>> > >>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#general-settings >>> > >>> > We have default settings for OSD and journal paths, which you could >>> > override >>> > if you can locate the data and journal sources on the renamed drives. If >>> > you >>> > mounted them, but didn't add them to the fstab, that might be the source >>> > of >>> > the problem. I'd rather see you use the default paths, as it would be >>> > easier >>> > to troubleshoot later. So did you mount the drives, but not add the >>> > mount >>> > points to fstab? >>> > >>> > John >>> > >>> > >>> > >>> > >>> > On Sat, Apr 20, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> >>> > wrote: >>> >> >>> >> Hi, >>> >> due a harware failure during expanding ceph, I'm in big trouble >>> >> because the cephfs doesn't mount anymore. >>> >> I was adding a couple storage nodes, but a disk has failed and after a >>> >> reboot the OS (ubuntu 12.04) renamed the remaining devices, so the >>> >> entire node has been screwed out. >>> >> >>> >> Now, from the "sane new node", I'm taking some new osd up and in >>> >> because the cluster is near full and I can't revert completely the >>> >> situation as before >>> >> >>> >> *I can* afford data loss, but i need to regain access to the filesystem >>> >> >>> >> My setup: >>> >> 3 mon + 3 mds >>> >> 4 storage nodes (i was adding no. 5 and 6) >>> >> >>> >> Ceph 0.56.4 >>> >> >>> >> >>> >> ceph health: >>> >> HEALTH_ERR 2008 pgs backfill; 246 pgs backfill_toofull; 74 pgs >>> >> backfilling; 134 pgs degraded; 790 pgs peering; 10 pgs recovering; >>> >> 1116 pgs recovery_wait; 790 pgs stuck inactive; 4782 pgs stuck >>> >> unclean; recovery 3049459/21926624 degraded (13.908%); recovering 6 >>> >> o/s, 16316KB/s; 4 full osd(s); 30 near full osd(s); full,noup,nodown >>> >> flag(s) set >>> >> >>> >> >>> >> >>> >> ceph mds dump: >>> >> dumped mdsmap epoch 44 >>> >> epoch 44 >>> >> flags 0 >>> >> created 2013-03-18 14:42:29.330548 >>> >> modified 2013-04-20 17:14:32.969332 >>> >> tableserver 0 >>> >> root 0 >>> >> session_timeout 60 >>> >> session_autoclose 300 >>> >> last_failure 43 >>> >> last_failure_osd_epoch 18160 >>> >> compat compat={},rocompat={},incompat={1=base v0.20,2=client >>> >> writeable ranges,3=default file layouts on dirs,4=dir inode in >>> >> separate object} >>> >> max_mds 1 >>> >> in 0 >>> >> up {0=6376} >>> >> failed >>> >> stopped >>> >> data_pools [0] >>> >> metadata_pool 1 >>> >> 6376: 192.168.21.11:6800/13457 'm1' mds.0.9 up:replay seq 1 >>> >> 5945: 192.168.21.13:6800/12999 'm3' mds.-1.0 up:standby seq 1 >>> >> 5963: 192.168.21.12:6800/22454 'm2' mds.-1.0 up:standby seq 1 >>> >> >>> >> >>> >> >>> >> ceph mon dump: >>> >> epoch 1 >>> >> fsid d634f7b3-8a8a-4893-bdfb-a95ccca7fddd >>> >> last_changed 2013-03-18 14:39:42.253923 >>> >> created 2013-03-18 14:39:42.253923 >>> >> 0: 192.168.21.11:6789/0 mon.m1 >>> >> 1: 192.168.21.12:6789/0 mon.m2 >>> >> 2: 192.168.21.13:6789/0 mon.m3 >>> >> _______________________________________________ >>> >> ceph-users mailing list >>> >> ceph-users@xxxxxxxxxxxxxx >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> > >>> > >>> > >>> > -- >>> > John Wilkins >>> > Senior Technical Writer >>> > Intank >>> > john.wilkins@xxxxxxxxxxx >>> > (415) 425-9599 >>> > http://inktank.com >> >> >> >> >> -- >> John Wilkins >> Senior Technical Writer >> Intank >> john.wilkins@xxxxxxxxxxx >> (415) 425-9599 >> http://inktank.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com