I created a single-node ceph cluster (v0.58) on a vm. Following is my conf file:
[global]
auth client required = none
auth cluster required = none
auth service required = none
[osd]
osd journal data = "">
filestore xattr use omap = true
# osd data = "">
[Index of Archives]
[Information on CEPH]
[Linux Filesystem Development]
[Ceph Development]
[Ceph Large]
[Ceph Dev]
[Linux USB Development]
[Video for Linux]
[Linux Audio Users]
[Yosemite News]
[Linux Kernel]
[Linux SCSI]
[xfs]
[mon.a]
host = varunc3-virtual-machine
mon addr = 10.72.148.201:6789
# mon data = "">
[mds.a]
host = varunc3-virtual-machine
# mds data = "">
Regards,
Varun
[osd.0]
host = varunc3-virtual-machine
Here is the output of ceph -s:
varunc@varunc3-virtual-machine:~$ ceph -s
health HEALTH_WARN 392 pgs degraded; 392 pgs stuck unclean; mds a is laggy
monmap e1: 1 mons at {a=10.72.148.201:6789/0}, election epoch 1, quorum 0 a
osdmap e45: 1 osds: 1 up, 1 in
pgmap v177: 392 pgs: 392 active+degraded; 0 bytes data, 13007 MB used, 62744 MB / 79745 MB avail
mdsmap e946: 1/1/1 up {0=a=up:replay(laggy or crashed)}
I believe due to this, I am not able to mount the ceph file system. I tried going through the mds-log, but could not understand much. I am pasting a part of it which shows errors (should I paste the whole thing?):
152 -29> 2013-03-26 19:25:58.301027 b4781b40 1 -- 10.72.148.201:6800/16609 <== mon.0 10.72.148.201:6789/0 10 ==== mdsbeacon(4897/a up:replay seq 2 v909) v2 ==== 103+0+0 (1650300491 0 0) 0x9e2e380 con 0x9e36200
153 -28> 2013-03-26 19:26:00.824303 b1d7ab40 0 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49340 s=1 pgs=0 cs=0 l=1).co nnect claims to be 10.72.148.201:6801/16695 not 10.72.148.201:6801/16036 - wrong node!
154 -27> 2013-03-26 19:26:00.824384 b1d7ab40 2 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49340 s=1 pgs=0 cs=0 l=1).fa ult 107: Transport endpoint is not connected
155 -26> 2013-03-26 19:26:02.300921 b257bb40 10 monclient: _send_mon_message to mon.a at 10.72.148.201:6789/0
156 -25> 2013-03-26 19:26:02.300954 b257bb40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6789/0 -- mdsbeacon(4897/a up:replay seq 3 v909) v2 -- ?+0 0 x9e2e8c0 con 0x9e36200
157 -24> 2013-03-26 19:26:02.301264 b4781b40 1 -- 10.72.148.201:6800/16609 <== mon.0 10.72.148.201:6789/0 11 ==== mdsbeacon(4897/a up:replay seq 3 v909) v2 ==== 103+0+0 (460647212 0 0) 0x9e2ec40 con 0x9e36200
158 -23> 2013-03-26 19:26:06.301163 b257bb40 10 monclient: _send_mon_message to mon.a at 10.72.148.201:6789/0
159 -22> 2013-03-26 19:26:06.301200 b257bb40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6789/0 -- mdsbeacon(4897/a up:replay seq 4 v909) v2 -- ?+0 0 x9e2e700 con 0x9e36200
160 -21> 2013-03-26 19:26:06.301512 b4781b40 1 -- 10.72.148.201:6800/16609 <== mon.0 10.72.148.201:6789/0 12 ==== mdsbeacon(4897/a up:replay seq 4 v909) v2 ==== 103+0+0 (1900474344 0 0) 0x9e2ea80 con 0x9e36200
161 -20> 2013-03-26 19:26:07.224712 b1d7ab40 0 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49341 s=1 pgs=0 cs=0 l=1).co nnect claims to be 10.72.148.201:6801/16695 not 10.72.148.201:6801/16036 - wrong node!
162 -19> 2013-03-26 19:26:07.224782 b1d7ab40 2 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49341 s=1 pgs=0 cs=0 l=1).fa ult 107: Transport endpoint is not connected
163 -18> 2013-03-26 19:26:07.299025 b377fb40 10 monclient: tick
164 -17> 2013-03-26 19:26:07.299047 b377fb40 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2013-03-26 19:25:37.299046)
165 -16> 2013-03-26 19:26:07.299072 b377fb40 10 monclient: renew subs? (now: 2013-03-26 19:26:07.299071; renew after: 2013-03-26 19:28:24.298915) -- no
166 -15> 2013-03-26 19:26:09.300863 b257bb40 10 monclient: renew_subs
167 -14> 2013-03-26 19:26:09.300892 b257bb40 10 monclient: _send_mon_message to mon.a at 10.72.148.201:6789/0
168 -13> 2013-03-26 19:26:09.300911 b257bb40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6789/0 -- mon_subscribe({mdsmap=910+,monmap=2+,osdmap=42}) v 2 -- ?+0 0x9e35360 con 0x9e36200
169 -12> 2013-03-26 19:26:09.301011 b257bb40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16036 -- ping v1 -- ?+0 0x9e35d80 con 0x9e36400
170 -11> 2013-03-26 19:26:09.301341 b1d7ab40 0 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49342 s=1 pgs=0 cs=0 l=1).co nnect claims to be 10.72.148.201:6801/16695 not 10.72.148.201:6801/16036 - wrong node!
171 -10> 2013-03-26 19:26:09.301409 b1d7ab40 2 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49342 s=1 pgs=0 cs=0 l=1).fa ult 107: Transport endpoint is not connected
172 -9> 2013-03-26 19:26:09.301812 b4781b40 1 -- 10.72.148.201:6800/16609 <== mon.0 10.72.148.201:6789/0 13 ==== osd_map(42..45 src has 1..45) v3 ==== 1 167+0+0 (3338985292 0 0) 0x9e2dc60 con 0x9e36200
173 -8> 2013-03-26 19:26:09.301887 b4781b40 1 -- 10.72.148.201:6800/16609 mark_down 0x9e36400 -- 0x9e2e540
174 -7> 2013-03-26 19:26:09.302019 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:1 mds0_inotable [read 0~0] 1.b 852b893 RETRY) v4 -- ?+0 0x9e26900 con 0x9e36700
175 -6> 2013-03-26 19:26:09.302036 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:2 mds0_sessionmap [read 0~0] 1 .3270c60b RETRY) v4 -- ?+0 0x9e26d80 con 0x9e36700
176 -5> 2013-03-26 19:26:09.302051 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:3 mds_anchortable [read 0~0] 1 .a977f6a7 RETRY) v4 -- ?+0 0x9e4d600 con 0x9e36700
177 -4> 2013-03-26 19:26:09.302060 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:4 mds_snaptable [read 0~0] 1.d 90270ad RETRY) v4 -- ?+0 0x9e4d480 con 0x9e36700
178 -3> 2013-03-26 19:26:09.302073 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:5 200.00000000 [read 0~0] 1.84 4f3494 RETRY) v4 -- ?+0 0x9e4d300 con 0x9e36700
179 -2> 2013-03-26 19:26:09.302472 b4781b40 0 mds.0.14 ms_handle_connect on 10.72.148.201:6801/16695
180 -1> 2013-03-26 19:26:09.303976 b4781b40 1 -- 10.72.148.201:6800/16609 <== osd.0 10.72.148.201:6801/16695 1 ==== osd_op_reply(1 mds0_inotable [read 0 ~0] ack = -2 (No such file or directory)) v4 ==== 112+0+0 (3010998831 0 0) 0x9e2d2c0 con 0x9e36700
181 0> 2013-03-26 19:26:09.305543 b4781b40 -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread b4781b40 tim e 2013-03-26 19:26:09.304022
182 mds/MDSTable.cc: 150: FAILED assert(0)
How do I get the mds running?
Varun
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
- Follow-Ups:
- Re: mds "laggy"
- From: Varun Chandramouli
- Re: mds "laggy"
- Prev by Date: Re: SSD Capacity and Partitions for OSD Journals
- Next by Date: Re: Object location
- Previous by thread: Ceph Crach at sync_thread_timeout after heavy random writes.
- Next by thread: Re: mds "laggy"
- Index(es):