Re: Cephfs unaccessible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Marco, 

If you do a "ceph tree" can you see if your OSDs are all up? You seem to have at least one problem related to the backfill OSDs being too full, and some which are near full or full for the purposes of storage. See the following in the documentation to see if this helps: 

http://ceph.com/docs/master/rados/configuration/mon-config-ref/#storage-capacity
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling
http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#no-free-drive-space

Before you start deleting data as a remedy, you'd want to at least try to get the OSDs back up and running first.

If rebooting changed the drive names, you might look here: 
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#general-settings

We have default settings for OSD and journal paths, which you could override if you can locate the data and journal sources on the renamed drives. If you mounted them, but didn't add them to the fstab, that might be the source of the problem. I'd rather see you use the default paths, as it would be easier to troubleshoot later. So did you mount the drives, but not add the mount points to fstab?

John




On Sat, Apr 20, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> wrote:
Hi,
due a harware failure during expanding ceph, I'm in big trouble
because the cephfs doesn't mount anymore.
I was adding a couple storage nodes, but a disk has failed and after a
reboot the OS (ubuntu 12.04) renamed the remaining devices, so the
entire node has been screwed out.

Now, from the "sane new node", I'm taking some new osd up and in
because the cluster is near full and I can't revert completely the
situation as before

*I can* afford data loss, but i need to regain access to the filesystem

My setup:
3 mon + 3 mds
4 storage nodes (i was adding no. 5 and 6)

Ceph 0.56.4


ceph health:
HEALTH_ERR 2008 pgs backfill; 246 pgs backfill_toofull; 74 pgs
backfilling; 134 pgs degraded; 790 pgs peering; 10 pgs recovering;
1116 pgs recovery_wait; 790 pgs stuck inactive; 4782 pgs stuck
unclean; recovery 3049459/21926624 degraded (13.908%);  recovering 6
o/s, 16316KB/s; 4 full osd(s); 30 near full osd(s); full,noup,nodown
flag(s) set



ceph mds dump:
dumped mdsmap epoch 44
epoch    44
flags    0
created    2013-03-18 14:42:29.330548
modified    2013-04-20 17:14:32.969332
tableserver    0
root    0
session_timeout    60
session_autoclose    300
last_failure    43
last_failure_osd_epoch    18160
compat    compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in
separate object}
max_mds    1
in    0
up    {0=6376}
failed
stopped
data_pools    [0]
metadata_pool    1
6376:    192.168.21.11:6800/13457 'm1' mds.0.9 up:replay seq 1
5945:    192.168.21.13:6800/12999 'm3' mds.-1.0 up:standby seq 1
5963:    192.168.21.12:6800/22454 'm2' mds.-1.0 up:standby seq 1



ceph mon dump:
epoch 1
fsid d634f7b3-8a8a-4893-bdfb-a95ccca7fddd
last_changed 2013-03-18 14:39:42.253923
created 2013-03-18 14:39:42.253923
0: 192.168.21.11:6789/0 mon.m1
1: 192.168.21.12:6789/0 mon.m2
2: 192.168.21.13:6789/0 mon.m3
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
John Wilkins
Senior Technical Writer
Intank
john.wilkins@xxxxxxxxxxx
(415) 425-9599
http://inktank.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux