Re: kernel: [ 8773.432358] libceph: osd1 192.168.0.131:6803 socket error on read

Sébastien Han <sebastien.han@xxxxxxxxxxxx> · Fri, 11 Oct 2013 15:02:18 +0200

Hi,

I was wondering, why did you use CephFS instead of RBD?
RBD is much more reliable and well integrated with QEMU/KVM.

Or perhaps you want to try CephFS?

––––
Sébastien Han
Cloud Engineer

"Always give 100%. Unless you're giving blood.”

Phone: +33 (0)1 49 70 99 72
Mail: sebastien.han@xxxxxxxxxxxx
Address : 10, rue de la Victoire - 75009 Paris
Web : www.enovance.com - Twitter : @enovance

On October 11, 2013 at 4:47:58 AM, Frerot, Jean-Sébastien (jsfrerot@xxxxxxxxxxxxxxxx) wrote:
>
>Hi,
>I followed this documentation and didn't specify any CRUSH settings.
>
>http://ceph.com/docs/next/rbd/rbd-openstack/
>
>--
>Jean-Sébastien Frerot
>jsfrerot@xxxxxxxxxxxxxxxx
>
>
>2013/10/10 Gregory Farnum  
>
>> Okay. As a quick guess you probably used a CRUSH placement option with
>> your new pools that wasn't supported by the old kernel, although it
>> might have been something else.
>>
>> I suspect that you'll find FUSE works better for you anyway as long as
>> you can use it — faster updates from us to you. ;)
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Thu, Oct 10, 2013 at 10:53 AM, Frerot, Jean-Sébastien
>> wrote:
>> > Hi,
>> > Thx for your reply :)
>> >
>> > kernel: Linux compute01 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10
>> 20:03:44
>> > UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>> >
>> > So yes I'm using cephfs and was also using rdb at the same time using
>> > different pools. My ceph fs was setup 3 months ago and I upgraded it a
>> > couple of days ago. I move VM images from rdb to the cephfs by copying
>> the
>> > file from rdb to local FS then to cephfs.
>> >
>> > I create pools like this:
>> > ceph osd pool create volumes 128
>> > ceph osd pool create images 128
>> > ceph osd pool create live_migration 128
>> >
>> > Yes I had checked dmesg but didn't find anything relevant.
>> >
>> > However, as a last resort I decided to mount my FS using fuse. And it
>> works
>> > like a charm. So for now I'm sticking with fuse :)
>> >
>> > Let me know if you want me to do some explicit testing. It may take some
>> > time for me to do them as I'm using ceph but I can manage to have some
>> time
>> > for maintenances.
>> >
>> > Regards,
>> >
>> >
>> > --
>> > Jean-Sébastien Frerot
>> > jsfrerot@xxxxxxxxxxxxxxxx
>> >
>> >
>> > 2013/10/10 Gregory Farnum  
>> >>
>> >> (Sorry for the delayed response, this was in my spam folder!)
>> >>
>> >> Has this issue persisted? Are you using the stock 13.04 kernel?
>> >>
>> >> Can you describe your setup a little more clearly? It sounds like
>> >> maybe you're using CephFS now and were using rbd before; is that
>> >> right? What data did you move, when, and how did you set up your
>> >> CephFS to use the pools?
>> >> The socket errors are often a slightly spammy notification that the
>> >> socket isn't in use but has shut down; here they look to be an
>> >> indicator of something actually gone wrong — perhaps you've
>> >> inadvertently activated features incompatible with your kernel client,
>> >> but let's see what's going on more before we jump to that conclusion.
>> >> have you checked dmesg for anything else at those points?
>> >> -Greg
>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >>
>> >> On Sat, Oct 5, 2013 at 6:42 PM, Frerot, Jean-Sébastien
>> >> wrote:
>> >> > Hi,
>> >> > I have a ceph cluster running with 3 physical servers,
>> >> >
>> >> > Here is how my setup is configured
>> >> > server1: mon, osd, mds
>> >> > server2: mon, osd, mds
>> >> > server3: mon
>> >> > OS ubuntu 13.04
>> >> > ceph version: 0.67.4-1raring (recentrly upgrade to see if my problem
>> >> > still
>> >> > persisted with the new version)
>> >> >
>> >> > So I was running version CUTTLEFISH until yesterday. And I was using
>> >> > ceph
>> >> > with openstack (using rdb) but I simplified my setup and removed
>> >> > openstack
>> >> > to simply use kvm with virtmanager.
>> >> >
>> >> > So I created a new pool to be able to do live migration of kvm
>> instances
>> >> > #ceph osd lspools
>> >> > 0 data,1 metadata,2 rbd,3 volumes,4 images,6 live_migration,
>> >> >
>> >> > I've been running VMs for some days without problems, but then I
>> notice
>> >> > that
>> >> > I couldn't use the full disk size of my first VM (web01 which was 160G
>> >> > big
>> >> > originaly) but now is only 119G stored in ceph. I also have a windows
>> >> > instance running on a 300G raw file located in ceph too. So trying to
>> >> > fix
>> >> > the issue I decided to do a local backup of my file in cause something
>> >> > goes
>> >> > wrong and guess what, i wasn't able to copy the file from ceph to my
>> >> > local
>> >> > drive. The moment I tried to do that "cp live_migration/web01 /mnt/"
>> the
>> >> > OS
>> >> > hangs, and syslog show this >30 lines/s:
>> >> >
>> >> > Oct 5 15:25:45 server2 kernel: [ 8773.432358] libceph: osd1
>> >> > 192.168.0.131:6803 socket error on read
>> >> >
>> >> > i couldn't kill my cp neither normally reboot my server. So I had to
>> >> > reset
>> >> > it.
>> >> >
>> >> > I tried to copy my other file "win2012" also stored in the ceph
>> cluster
>> >> > and
>> >> > get the same issue and now I can't read anything from it nor start my
>> VM
>> >> > again
>> >> >
>> >> > [root@server1 ~]# ceph status
>> >> > cluster 50dc0404-c081-4c43-ac3f-872ba5494bd7
>> >> > health HEALTH_OK
>> >> > monmap e4: 3 mons at
>> >> >
>> >> > {server1=
>> 192.168.0.130:6789/0,server2=192.168.0.131:6789/0,server3=192.168.0.132:6789/0
>> },
>> >> > election epoch 120, quorum 0,1,2 server1,server2,server3
>> >> > osdmap e275: 2 osds: 2 up, 2 in
>> >> > pgmap v1508209: 576 pgs: 576 active+clean; 108 GB data, 214 GB
>> used,
>> >> > 785
>> >> > GB / 999 GB avail
>> >> > mdsmap e181: 1/1/1 up {0=server2=up:active}, 1 up:standby
>> >> >
>> >> > I mount the FS with fstab like this:
>> >> > 192.168.0.131:6789,192.168.0.130:6789:/live_migration
>> /var/lib/instances
>> >> > ceph name=live_migration,secret=mysecret==,noatime 0 2
>> >> >
>> >> > I get this log in ceph-osd.0.log as spammy as "socket error on read"
>> >> > error i
>> >> > get in syslog
>> >> > 2013-10-05 23:07:23.586807 7f24731cc700 0 --
>> 192.168.0.130:6801/19182
>> >> > >>
>> >> > 192.168.0.130:0/4212596483 pipe(0x128d8500 sd=115 :6801 s=0 pgs=0
>> cs=0
>> >> > l=0
>> >> > c=0x14ac09a0).accept peer addr is rea
>> >> > lly 192.168.0.130:0/4212596483 (socket is 192.168.0.130:35078/0)
>> >> >
>> >> > other infos:
>> >> > df -h
>> >> > /dev/mapper/server1--vg-ceph 500G 108G 393G
>> >> > 22%
>> >> > /opt/data/ceph
>> >> > 192.168.0.131:6789,192.168.0.130:6789:/live_migration 1000G 215G
>> 786G
>> >> > 22%
>> >> > /var/lib/instances
>> >> > ...
>> >> >
>> >> > mount
>> >> > /dev/mapper/server1--vg-ceph on /opt/data/ceph type xfs (rw,noatime)
>> >> > 192.168.0.131:6789,192.168.0.130:6789:/live_migration on
>> >> > /var/lib/instances
>> >> > type ceph (name=live_migration,key=client.live_migration)
>> >> > ...
>> >> >
>> >> >
>> >> > How can I recover from this ?
>> >> >
>> >> > Thank you,
>> >> > --
>> >> > Jean-Sébastien Frerot
>> >> > jsfrerot@xxxxxxxxxxxxxxxx
>> >> >
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@xxxxxxxxxxxxxx
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >
>> >
>>
>_______________________________________________
>ceph-users mailing list
>ceph-users@xxxxxxxxxxxxxx
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com