Re: Cephfs unaccessible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So, I've restarted the new osds as many as possible and the cluster
started to move data to the 2 new nodes overnight.
This morning there was not netowrk traffic and the healt was

HEALTH_ERR 1323 pgs backfill; 150 pgs backfill_toofull; 100 pgs
backfilling; 114 pgs degraded; 3374 pgs peering; 36 pgs recovering;
949 pgs recovery_wait; 3374 pgs stuck inactive; 6289 pgs stuck
unclean; recovery 2130652/20890113 degraded (10.199%); 58/8914654
unfound (0.001%); 1 full osd(s); 22 near full osd(s); full,noup,nodown
flag(s) set

So I have unset the noup and nodown flags and the data started movin again
I've increased the full ratio to 97% so now there's no "official" full
osd and the HEALTH_ERR became HEALT_WARN

However, still no access to filesystem

HEALTH_WARN 1906 pgs backfill; 21 pgs backfill_toofull; 52 pgs
backfilling; 707 pgs degraded; 371 pgs down; 97 pgs incomplete; 3385
pgs peering; 35 pgs recovering; 1002 pgs recovery_wait; 4 pgs stale;
683 pgs stuck inactive; 5898 pgs stuck unclean; recovery
3081499/22208859 degraded (13.875%); 487/9433642 unfound (0.005%);
recovering 11722 o/s, 57040MB/s; 17 near full osd(s)

The osd are flapping in/out again...

I'm disposed to start deleting some portion of data.
What can I try to do now?

2013/4/21 Gregory Farnum <greg@xxxxxxxxxxx>:
> It's not entirely clear from your description and the output you've
> given us, but it looks like maybe you've managed to bring up all your
> OSDs correctly at this point? Or are they just not reporting down
> because you set the "no down" flag...
>
> In any case, CephFS isn't going to come up while the underlying RADOS
> cluster is this unhealthy, so you're going to need to get that going
> again. Since your OSDs have managed to get themselves so full it's
> going to be trickier than normal, but if all the rebalancing that's
> happening is only because you sort-of-didn't-really lose nodes, and
> you can bring them all back up, you should be able to sort it out by
> getting all the nodes back up, and then changing your full percentages
> (by a *very small* amount); since you haven't been doing any writes to
> the cluster it shouldn't take much data writes to get everything back
> where it was, although if this has been continuing to backfill in the
> meanwhile that will need to unwind.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Sat, Apr 20, 2013 at 12:21 PM, John Wilkins <john.wilkins@xxxxxxxxxxx> wrote:
>> I don't see anything related to lost objects in your output. I just see
>> waiting on backfill, backfill_toofull, remapped, and so forth. You can read
>> a bit about what is going on here:
>> http://ceph.com/docs/next/rados/operations/monitoring-osd-pg/
>>
>> Keep us posted as to the recovery, and let me know what I can do to improve
>> the docs for scenarios like this.
>>
>>
>> On Sat, Apr 20, 2013 at 10:52 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx>
>> wrote:
>>>
>>> John,
>>> thanks for the quick reply.
>>> Below you can see my ceph osd tree
>>> The problem is caused not by the failure itself, but by the "renamed"
>>> bunch of devices.
>>> It was like a deadly 15-puzzle
>>> I think that the solution was to mount the devices in fstab using UUID
>>> (/dev/disk/by-uuid) instead of /dev/sdX
>>>
>>> However, yes I have an entry in my ceph.conf (devs = /dev/sdX1 --
>>> osd_journal = /dev/sdX2) *and* an entry in my fstab for each OSD
>>>
>>> The node with failed disk is s103 (osd.59)
>>>
>>> Now i have 5 osd from s203 up and in to try to let ceph rebalance
>>> data... but is still a bloody mess.
>>> Look at ceph -w output: is reported a total of 110TB: is wrong... al
>>> drives are 2TB and i have 49 drives up and in -- total 98Tb
>>> I think that 110TB (55 osd) was the size before cluster became
>>> inaccessible
>>>
>>> # id    weight    type name    up/down    reweight
>>> -1    130    root default
>>> -9    65        room p1
>>> -3    44            rack r14
>>> -4    22                host s101
>>> 11    2                    osd.11    up    1
>>> 12    2                    osd.12    up    1
>>> 13    2                    osd.13    up    1
>>> 14    2                    osd.14    up    1
>>> 15    2                    osd.15    up    1
>>> 16    2                    osd.16    up    1
>>> 17    2                    osd.17    up    1
>>> 18    2                    osd.18    up    1
>>> 19    2                    osd.19    up    1
>>> 20    2                    osd.20    up    1
>>> 21    2                    osd.21    up    1
>>> -6    22                host s102
>>> 33    2                    osd.33    up    1
>>> 34    2                    osd.34    up    1
>>> 35    2                    osd.35    up    1
>>> 36    2                    osd.36    up    1
>>> 37    2                    osd.37    up    1
>>> 38    2                    osd.38    up    1
>>> 39    2                    osd.39    up    1
>>> 40    2                    osd.40    up    1
>>> 41    2                    osd.41    up    1
>>> 42    2                    osd.42    up    1
>>> 43    2                    osd.43    up    1
>>> -13    21            rack r10
>>> -12    21                host s103
>>> 55    2                    osd.55    up    0
>>> 56    2                    osd.56    up    0
>>> 57    2                    osd.57    up    0
>>> 58    2                    osd.58    up    0
>>> 59    2                    osd.59    down    0
>>> 60    2                    osd.60    down    0
>>> 61    2                    osd.61    down    0
>>> 62    2                    osd.62    up    0
>>> 63    2                    osd.63    up    0
>>> 64    1.5                    osd.64    up    0
>>> 65    1.5                    osd.65    down    0
>>> -10    65        room p2
>>> -7    22            rack r20
>>> -5    22                host s202
>>> 22    2                    osd.22    up    1
>>> 23    2                    osd.23    up    1
>>> 24    2                    osd.24    up    1
>>> 25    2                    osd.25    up    1
>>> 26    2                    osd.26    up    1
>>> 27    2                    osd.27    up    1
>>> 28    2                    osd.28    up    1
>>> 29    2                    osd.29    up    1
>>> 30    2                    osd.30    up    1
>>> 31    2                    osd.31    up    1
>>> 32    2                    osd.32    up    1
>>> -8    22            rack r22
>>> -2    22                host s201
>>> 0    2                    osd.0    up    1
>>> 1    2                    osd.1    up    1
>>> 2    2                    osd.2    up    1
>>> 3    2                    osd.3    up    1
>>> 4    2                    osd.4    up    1
>>> 5    2                    osd.5    up    1
>>> 6    2                    osd.6    up    1
>>> 7    2                    osd.7    up    1
>>> 8    2                    osd.8    up    1
>>> 9    2                    osd.9    up    1
>>> 10    2                    osd.10    up    1
>>> -14    21            rack r21
>>> -11    21                host s203
>>> 44    2                    osd.44    up    1
>>> 45    2                    osd.45    up    1
>>> 46    2                    osd.46    up    1
>>> 47    2                    osd.47    up    1
>>> 48    2                    osd.48    up    1
>>> 49    2                    osd.49    up    0
>>> 50    2                    osd.50    up    0
>>> 51    2                    osd.51    up    0
>>> 52    1.5                    osd.52    up    0
>>> 53    1.5                    osd.53    up    0
>>> 54    2                    osd.54    up    0
>>>
>>>
>>> ceph -w
>>>
>>> 2013-04-20 19:46:48.608988 mon.0 [INF] pgmap v1352767: 17280 pgs: 58
>>> active, 12581 active+clean, 1686 active+remapped+wait_backfill, 24
>>> active+degraded+wait_backfill, 224
>>> active+remapped+wait_backfill+backfill_toofull, 1061
>>> active+recovery_wait, 4
>>> active+degraded+wait_backfill+backfill_toofull, 629 peering, 626
>>> active+remapped, 72 active+remapped+backfilling, 89 active+degraded,
>>> 14 active+remapped+backfill_toofull, 1 active+clean+scrubbing, 8
>>> active+degraded+remapped+wait_backfill, 20
>>> active+recovery_wait+remapped, 5
>>> active+degraded+remapped+wait_backfill+backfill_toofull, 162
>>> remapped+peering, 1 active+degraded+remapped+backfilling, 2
>>> active+degraded+remapped+backfill_toofull, 13 active+recovering; 49777
>>> GB data, 72863 GB used, 40568 GB / 110 TB avail; 2965687/21848501
>>> degraded (13.574%);  recovering 5 o/s, 16363B/s
>>>
>>> 2013/4/20 John Wilkins <john.wilkins@xxxxxxxxxxx>:
>>> > Marco,
>>> >
>>> > If you do a "ceph tree" can you see if your OSDs are all up? You seem to
>>> > have at least one problem related to the backfill OSDs being too full,
>>> > and
>>> > some which are near full or full for the purposes of storage. See the
>>> > following in the documentation to see if this helps:
>>> >
>>> >
>>> > http://ceph.com/docs/master/rados/configuration/mon-config-ref/#storage-capacity
>>> >
>>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling
>>> >
>>> > http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#no-free-drive-space
>>> >
>>> > Before you start deleting data as a remedy, you'd want to at least try
>>> > to
>>> > get the OSDs back up and running first.
>>> >
>>> > If rebooting changed the drive names, you might look here:
>>> >
>>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#general-settings
>>> >
>>> > We have default settings for OSD and journal paths, which you could
>>> > override
>>> > if you can locate the data and journal sources on the renamed drives. If
>>> > you
>>> > mounted them, but didn't add them to the fstab, that might be the source
>>> > of
>>> > the problem. I'd rather see you use the default paths, as it would be
>>> > easier
>>> > to troubleshoot later. So did you mount the drives, but not add the
>>> > mount
>>> > points to fstab?
>>> >
>>> > John
>>> >
>>> >
>>> >
>>> >
>>> > On Sat, Apr 20, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >> due a harware failure during expanding ceph, I'm in big trouble
>>> >> because the cephfs doesn't mount anymore.
>>> >> I was adding a couple storage nodes, but a disk has failed and after a
>>> >> reboot the OS (ubuntu 12.04) renamed the remaining devices, so the
>>> >> entire node has been screwed out.
>>> >>
>>> >> Now, from the "sane new node", I'm taking some new osd up and in
>>> >> because the cluster is near full and I can't revert completely the
>>> >> situation as before
>>> >>
>>> >> *I can* afford data loss, but i need to regain access to the filesystem
>>> >>
>>> >> My setup:
>>> >> 3 mon + 3 mds
>>> >> 4 storage nodes (i was adding no. 5 and 6)
>>> >>
>>> >> Ceph 0.56.4
>>> >>
>>> >>
>>> >> ceph health:
>>> >> HEALTH_ERR 2008 pgs backfill; 246 pgs backfill_toofull; 74 pgs
>>> >> backfilling; 134 pgs degraded; 790 pgs peering; 10 pgs recovering;
>>> >> 1116 pgs recovery_wait; 790 pgs stuck inactive; 4782 pgs stuck
>>> >> unclean; recovery 3049459/21926624 degraded (13.908%);  recovering 6
>>> >> o/s, 16316KB/s; 4 full osd(s); 30 near full osd(s); full,noup,nodown
>>> >> flag(s) set
>>> >>
>>> >>
>>> >>
>>> >> ceph mds dump:
>>> >> dumped mdsmap epoch 44
>>> >> epoch    44
>>> >> flags    0
>>> >> created    2013-03-18 14:42:29.330548
>>> >> modified    2013-04-20 17:14:32.969332
>>> >> tableserver    0
>>> >> root    0
>>> >> session_timeout    60
>>> >> session_autoclose    300
>>> >> last_failure    43
>>> >> last_failure_osd_epoch    18160
>>> >> compat    compat={},rocompat={},incompat={1=base v0.20,2=client
>>> >> writeable ranges,3=default file layouts on dirs,4=dir inode in
>>> >> separate object}
>>> >> max_mds    1
>>> >> in    0
>>> >> up    {0=6376}
>>> >> failed
>>> >> stopped
>>> >> data_pools    [0]
>>> >> metadata_pool    1
>>> >> 6376:    192.168.21.11:6800/13457 'm1' mds.0.9 up:replay seq 1
>>> >> 5945:    192.168.21.13:6800/12999 'm3' mds.-1.0 up:standby seq 1
>>> >> 5963:    192.168.21.12:6800/22454 'm2' mds.-1.0 up:standby seq 1
>>> >>
>>> >>
>>> >>
>>> >> ceph mon dump:
>>> >> epoch 1
>>> >> fsid d634f7b3-8a8a-4893-bdfb-a95ccca7fddd
>>> >> last_changed 2013-03-18 14:39:42.253923
>>> >> created 2013-03-18 14:39:42.253923
>>> >> 0: 192.168.21.11:6789/0 mon.m1
>>> >> 1: 192.168.21.12:6789/0 mon.m2
>>> >> 2: 192.168.21.13:6789/0 mon.m3
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users@xxxxxxxxxxxxxx
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > John Wilkins
>>> > Senior Technical Writer
>>> > Intank
>>> > john.wilkins@xxxxxxxxxxx
>>> > (415) 425-9599
>>> > http://inktank.com
>>
>>
>>
>>
>> --
>> John Wilkins
>> Senior Technical Writer
>> Intank
>> john.wilkins@xxxxxxxxxxx
>> (415) 425-9599
>> http://inktank.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux