Re: kernel: [ 8773.432358] libceph: osd1 192.168.0.131:6803 socket error on read

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 10 Oct 2013 09:18:45 -0700

(Sorry for the delayed response, this was in my spam folder!)

Has this issue persisted? Are you using the stock 13.04 kernel?

Can you describe your setup a little more clearly? It sounds like
maybe you're using CephFS now and were using rbd before; is that
right? What data did you move, when, and how did you set up your
CephFS to use the pools?
The socket errors are often a slightly spammy notification that the
socket isn't in use but has shut down; here they look to be an
indicator of something actually gone wrong — perhaps you've
inadvertently activated features incompatible with your kernel client,
but let's see what's going on more before we jump to that conclusion.
have you checked dmesg for anything else at those points?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Sat, Oct 5, 2013 at 6:42 PM, Frerot, Jean-Sébastien
<jsfrerot@xxxxxxxxxxxxxxxx> wrote:
> Hi,
>   I have a ceph cluster running with 3 physical servers,
>
> Here is how my setup is configured
> server1: mon, osd, mds
> server2: mon, osd, mds
> server3: mon
> OS ubuntu 13.04
> ceph version: 0.67.4-1raring (recentrly upgrade to see if my problem still
> persisted with the new version)
>
> So I was running version CUTTLEFISH until yesterday. And I was using ceph
> with openstack (using rdb) but I simplified my setup and removed openstack
> to simply use kvm with virtmanager.
>
> So I created a new pool to be able to do live migration of kvm instances
> #ceph osd lspools
> 0 data,1 metadata,2 rbd,3 volumes,4 images,6 live_migration,
>
> I've been running VMs for some days without problems, but then I notice that
> I couldn't use the full disk size of my first VM (web01 which was 160G big
> originaly) but now is only 119G stored in ceph. I also have a windows
> instance running on a 300G raw file located in ceph too. So trying to fix
> the issue I decided to do a local backup of my file in cause something goes
> wrong and guess what, i wasn't able to copy the file from ceph to my local
> drive. The moment I tried to do that "cp live_migration/web01 /mnt/" the OS
> hangs, and syslog show this >30 lines/s:
>
> Oct  5 15:25:45 server2 kernel: [ 8773.432358] libceph: osd1
> 192.168.0.131:6803 socket error on read
>
> i couldn't kill my cp neither normally reboot my server. So I had to reset
> it.
>
> I tried to copy my other file "win2012" also stored in the ceph cluster and
> get the same issue and now I can't read anything from it nor start my VM
> again
>
> [root@server1 ~]# ceph status
>   cluster 50dc0404-c081-4c43-ac3f-872ba5494bd7
>    health HEALTH_OK
>    monmap e4: 3 mons at
> {server1=192.168.0.130:6789/0,server2=192.168.0.131:6789/0,server3=192.168.0.132:6789/0},
> election epoch 120, quorum 0,1,2 server1,server2,server3
>    osdmap e275: 2 osds: 2 up, 2 in
>     pgmap v1508209: 576 pgs: 576 active+clean; 108 GB data, 214 GB used, 785
> GB / 999 GB avail
>    mdsmap e181: 1/1/1 up {0=server2=up:active}, 1 up:standby
>
> I mount the FS with fstab like this:
> 192.168.0.131:6789,192.168.0.130:6789:/live_migration /var/lib/instances
> ceph name=live_migration,secret=mysecret==,noatime 0 2
>
> I get this log in ceph-osd.0.log as spammy as "socket error on read" error i
> get in syslog
> 2013-10-05 23:07:23.586807 7f24731cc700  0 -- 192.168.0.130:6801/19182 >>
> 192.168.0.130:0/4212596483 pipe(0x128d8500 sd=115 :6801 s=0 pgs=0 cs=0 l=0
> c=0x14ac09a0).accept peer addr is rea
> lly 192.168.0.130:0/4212596483 (socket is 192.168.0.130:35078/0)
>
> other infos:
> df -h
> /dev/mapper/server1--vg-ceph                         500G  108G  393G  22%
> /opt/data/ceph
> 192.168.0.131:6789,192.168.0.130:6789:/live_migration 1000G  215G  786G  22%
> /var/lib/instances
> ...
>
> mount
> /dev/mapper/server1--vg-ceph on /opt/data/ceph type xfs (rw,noatime)
> 192.168.0.131:6789,192.168.0.130:6789:/live_migration on /var/lib/instances
> type ceph (name=live_migration,key=client.live_migration)
> ...
>
>
> How can I recover from this ?
>
> Thank you,
> --
> Jean-Sébastien Frerot
> jsfrerot@xxxxxxxxxxxxxxxx
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com