Re: mds port 6800 socket closed when accessing mount point

Colin McCabe <cmccabe@xxxxxxxxxxxxxx> · Wed, 27 Oct 2010 10:10:11 -0700

Hi Yonggang,

Are all of the daemons still running? What is at the end of the logfiles?

regards,
Colin

On Wed, Oct 27, 2010 at 9:42 AM, Yonggang Liu <myidpt@xxxxxxxxx> wrote:
> Hello,
>
> I'm totally new to Ceph. Last a few days I set up 4 VMs to run Ceph:
> "mds0" for the metadata server and monitor, "osd0" and "osd1" for two
> data servers, and "client" for the client machine. The VMs are running
> Debian 5.0 with kernel 2.6.32-5-686 (Ceph module is enabled).
> I followed "Building kernel client" and "Debian" from the wiki, and I
> was able to start Ceph and mount Ceph at the client. But the problem
> is, the mounted point always fail with an infinite response time
> (after I mount Ceph for about 1 min or less). To illustrate it better,
> I will show you the information I got on the client and mds0 machines:
>
> mds0 (192.168.89.133):
> debian:~# mkcephfs -c /etc/ceph/ceph.conf --allhosts -v
> (A lot of info)
> debian:~# /etc/init.d/ceph -a start
> (some info)
>
> client (192.168.89.131):
> debian:~# mount -t ceph 192.168.89.133:/ /ceph
> debian:~# cd /ceph
> debian:/ceph# cp ~/app_ch.xls .
> debian:/ceph# ls
> (waiting for ever)
> ^C
>
> After the failure I ran dmesg at the client side and got:
> client (192.168.89.131):
> debian:/ceph# dmesg -c
> [  636.664425] ceph: loaded (mon/mds/osd proto 15/32/24, osdmap 5/5 5/5)
> [  636.694973] ceph: client4100 fsid 423ad64c-bbf0-3011-bb47-36a89f8787c6
> [  636.700716] ceph: mon0 192.168.89.133:6789 session established
> [  664.114551] ceph: mds0 192.168.89.133:6800 socket closed
> [  664.848722] ceph: mds0 192.168.89.133:6800 socket closed
> [  665.914923] ceph: mds0 192.168.89.133:6800 socket closed
> [  667.840396] ceph: mds0 192.168.89.133:6800 socket closed
> [  672.054106] ceph: mds0 192.168.89.133:6800 socket closed
> [  680.894531] ceph: mds0 192.168.89.133:6800 socket closed
> [  696.928496] ceph: mds0 192.168.89.133:6800 socket closed
> [  720.171754] ceph: mds0 caps stale
> [  728.999701] ceph: mds0 192.168.89.133:6800 socket closed
> [  794.640943] ceph: mds0 192.168.89.133:6800 socket closed
>
> Immediately after the failure, I ran netstat at mds0:
> mds0 (192.168.89.133):
> debian:~# netstat -anp
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address           Foreign Address
> State       PID/Program name
> tcp        0      0 0.0.0.0:6800            0.0.0.0:*
> LISTEN      1889/cmds
> tcp        0      0 0.0.0.0:22              0.0.0.0:*
> LISTEN      1529/sshd
> tcp        0      0 192.168.89.133:6789     0.0.0.0:*
> LISTEN      1840/cmon
> tcp        0      0 192.168.89.133:6789     192.168.89.131:56855
> ESTABLISHED 1840/cmon
> tcp        0      0 192.168.89.133:43647    192.168.89.133:6789
> ESTABLISHED 1889/cmds
> tcp        0      0 192.168.89.133:22       192.168.89.1:58304
> ESTABLISHED 1530/0
> tcp        0      0 192.168.89.133:39826    192.168.89.134:6800
> ESTABLISHED 1889/cmds
> tcp        0      0 192.168.89.133:6789     192.168.89.134:41289
> ESTABLISHED 1840/cmon
> tcp        0      0 192.168.89.133:6800     192.168.89.131:52814
> TIME_WAIT   -
> tcp        0      0 192.168.89.133:6789     192.168.89.135:41021
> ESTABLISHED 1840/cmon
> tcp        0      0 192.168.89.133:42069    192.168.89.135:6800
> ESTABLISHED 1889/cmds
> tcp        0      0 192.168.89.133:6789     192.168.89.133:43647
> ESTABLISHED 1840/cmon
> tcp        0      0 192.168.89.133:6800     192.168.89.131:52815
> TIME_WAIT   -
> tcp        0      0 192.168.89.133:6800     192.168.89.131:52816
> TIME_WAIT   -
> tcp6       0      0 :::22                   :::*
> LISTEN      1529/sshd
> udp        0      0 0.0.0.0:68              0.0.0.0:*
>         1490/dhclient3
> Active UNIX domain sockets (servers and established)
> Proto RefCnt Flags       Type       State         I-Node   PID/Program
> name    Path
> unix  2      [ ]         DGRAM                    2972     546/udevd
>        @/org/kernel/udev/udevd
> unix  4      [ ]         DGRAM                    5343
> 1358/rsyslogd       /dev/log
> unix  2      [ ]         DGRAM                    5662     1530/0
> unix  2      [ ]         DGRAM                    5486     1490/dhclient3
> debian:~#
> debian:~# dmesg -c
> debian:~# (nothing shows up)
>
> I saw the port 6800 on the metadata server talking with the client is
> on "TIME_WAIT" stage. That means the connection is closed.
> This is the ceph.conf I have:
> [global]
>       pid file = /var/run/ceph/$type.$id.pid
> [mon]
>       mon data = /data/mon$id
>       mon subscribe interval = 6000
>       mon osd down out interval = 6000
> [mon0]
>       host = mds0
>       mon addr = 192.168.89.133:6789
> [mds]
>       mds session timeout = 6000
>       mds session autoclose = 6000
>       mds client lease = 6000
>       keyring = /data/keyring.$name
> [mds0]
>       host = mds0
> [osd]
>       sudo = true
>       osd data = /data/osd$id
>       osd journal = /journal
>       osd journal size = 1024
>       filestore journal writeahead = true
> [osd0]
>       host = osd0
> [osd1]
>       host = osd1
> [group everyone]
>       addr = 0.0.0.0/0
> [mount]
>       allow = %everyone
> ;-----------------------------------end-----------------------------------
>
> The Ceph version I was using is 0.22.1.
>
> Can anyone help me to solve this problem? Thanks in advance!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html