Hi,
I ran cephfs-journal-tool to inspect journal 12 hours ago - it's still
running. Or... it didn't crush yet, although I don't see any output from
it. Is it normal behaviour?
Thanks for help.
Andrzej
W dniu 2015-08-26 o 15:49, Gregory Farnum pisze:
There is a cephfs-journal-tool that I believe is present in hammer and
ought to let you get your MDS through replay. Depending on which PGs
were lost you will have holes and/or missing files, in addition to not
being able to find parts of the directory hierarchy (and maybe getting
crashes if you access them). You can explore the options there and if
the documentation is sparse, feel free to ask questions...
-Greg
On Wed, Aug 26, 2015 at 1:44 PM, Andrzej Łukawski <alukawski@xxxxxxxxxx> wrote:
Thank you for answer. I lost 2 disks on 1st node and 1 disk on 2nd. I
understand it is not possible to recover the data even partially?
Unfortunatelly those disks are lost forever.
Andrzej
W dniu 2015-08-26 o 12:26, Jan Schermer pisze:
If you lost 3 disks with size 2 and at least 2 of those disks were in
different host, that means you lost data with the default CRUSH.
There's nothing you can do but either get those disks back in or recover
from backup.
Jan
On 26 Aug 2015, at 12:18, Andrzej Łukawski <alukawski@xxxxxxxxxx> wrote:
Hi,
We have ceph cluster (Ceph version 0.94.2) which consists of four nodes with
four disks on each node. Ceph is configured to hold two replicas (size 2).
We use this cluster for ceph filesystem. Few days ago we had power outage
after which I had to replace three of our cluster OSD disks. All OSD disks
are now online, but I'm unable to mount filesystem and constantly receive
'mount error 5 = Input/output error'. Ceph status shows many 'incomplete'
pgs and that 'mds cluster is degraded'. According to 'ceph health detail'
mds is replaying journal.
[root@cnode0 ceph]# ceph -s
cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24
health HEALTH_WARN
25 pgs backfill_toofull
10 pgs degraded
126 pgs down
263 pgs incomplete
54 pgs stale
10 pgs stuck degraded
263 pgs stuck inactive
54 pgs stuck stale
289 pgs stuck unclean
10 pgs stuck undersized
10 pgs undersized
4 requests are blocked > 32 sec
recovery 27139/10407227 objects degraded (0.261%)
recovery 168597/10407227 objects misplaced (1.620%)
4 near full osd(s)
too many PGs per OSD (312 > max 300)
mds cluster is degraded
monmap e6: 6 mons at
{0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0}
election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m
mdsmap e1236: 1/1/1 up {0=2=up:replay}, 2 up:standby
osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs
pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects
32825 GB used, 11698 GB / 44524 GB avail
27139/10407227 objects degraded (0.261%)
168597/10407227 objects misplaced (1.620%)
2153 active+clean
137 incomplete
126 down+incomplete
54 stale+active+clean
15 active+remapped+backfill_toofull
10 active+undersized+degraded+remapped+backfill_toofull
1 active+remapped
[root@cnode0 ceph]#
I wasn't able to find any solution in the Internet and I worry I will make
things even worse when continue to troubleshoot this on my own. I'm stuck.
Could you please help?
Thanks.
Andrzej
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com