Re: kernel: [ 8773.432358] libceph: osd1 192.168.0.131:6803 socket error on read

Frerot, Jean-Sébastien <jsfrerot@xxxxxxxxxxxxxxxx> · Thu, 10 Oct 2013 19:26:50 -0400

Hi,  I followed this documentation and didn't specify any CRUSH settings.

http://ceph.com/docs/next/rbd/rbd-openstack/

--
Jean-Sébastien Frerot
jsfrerot@xxxxxxxxxxxxxxxx

2013/10/10 Gregory Farnum <greg@xxxxxxxxxxx>

Okay. As a quick guess you probably used a CRUSH placement option with

your new pools that wasn't supported by the old kernel, although it

might have been something else.

I suspect that you'll find FUSE works better for you anyway as long as

you can use it — faster updates from us to you. ;)

-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com

On Thu, Oct 10, 2013 at 10:53 AM, Frerot, Jean-Sébastien

<jsfrerot@xxxxxxxxxxxxxxxx> wrote:

> Hi,

>   Thx for your reply :)

>

> kernel: Linux compute01 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44

> UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

>

> So yes I'm using cephfs and was also using rdb at the same time using

> different pools. My ceph fs was setup 3 months ago and I upgraded it a

> couple of days ago. I move VM images from rdb to the cephfs by copying the

> file from rdb to local FS then to cephfs.

>

> I create pools like this:

> ceph osd pool create volumes 128

> ceph osd pool create images 128

> ceph osd pool create live_migration 128

>

> Yes I had checked dmesg but didn't find anything relevant.

>

> However, as a last resort I decided to mount my FS using fuse. And it works

> like a charm. So for now I'm sticking with fuse :)

>

> Let me know if you want me to do some explicit testing. It may take some

> time for me to do them as I'm using ceph but I can manage to have some time

> for maintenances.

>

> Regards,

>

>

> --

> Jean-Sébastien Frerot

> jsfrerot@xxxxxxxxxxxxxxxx

>

>

> 2013/10/10 Gregory Farnum <greg@xxxxxxxxxxx>

>>

>> (Sorry for the delayed response, this was in my spam folder!)

>>

>> Has this issue persisted? Are you using the stock 13.04 kernel?

>>

>> Can you describe your setup a little more clearly? It sounds like

>> maybe you're using CephFS now and were using rbd before; is that

>> right? What data did you move, when, and how did you set up your

>> CephFS to use the pools?

>> The socket errors are often a slightly spammy notification that the

>> socket isn't in use but has shut down; here they look to be an

>> indicator of something actually gone wrong — perhaps you've

>> inadvertently activated features incompatible with your kernel client,

>> but let's see what's going on more before we jump to that conclusion.

>> have you checked dmesg for anything else at those points?

>> -Greg

>> Software Engineer #42 @ http://inktank.com | http://ceph.com

>>

>> On Sat, Oct 5, 2013 at 6:42 PM, Frerot, Jean-Sébastien

>> <jsfrerot@xxxxxxxxxxxxxxxx> wrote:

>> > Hi,

>> >   I have a ceph cluster running with 3 physical servers,

>> >

>> > Here is how my setup is configured

>> > server1: mon, osd, mds

>> > server2: mon, osd, mds

>> > server3: mon

>> > OS ubuntu 13.04

>> > ceph version: 0.67.4-1raring (recentrly upgrade to see if my problem

>> > still

>> > persisted with the new version)

>> >

>> > So I was running version CUTTLEFISH until yesterday. And I was using

>> > ceph

>> > with openstack (using rdb) but I simplified my setup and removed

>> > openstack

>> > to simply use kvm with virtmanager.

>> >

>> > So I created a new pool to be able to do live migration of kvm instances

>> > #ceph osd lspools

>> > 0 data,1 metadata,2 rbd,3 volumes,4 images,6 live_migration,

>> >

>> > I've been running VMs for some days without problems, but then I notice

>> > that

>> > I couldn't use the full disk size of my first VM (web01 which was 160G

>> > big

>> > originaly) but now is only 119G stored in ceph. I also have a windows

>> > instance running on a 300G raw file located in ceph too. So trying to

>> > fix

>> > the issue I decided to do a local backup of my file in cause something

>> > goes

>> > wrong and guess what, i wasn't able to copy the file from ceph to my

>> > local

>> > drive. The moment I tried to do that "cp live_migration/web01 /mnt/" the

>> > OS

>> > hangs, and syslog show this >30 lines/s:

>> >

>> > Oct  5 15:25:45 server2 kernel: [ 8773.432358] libceph: osd1

>> > 192.168.0.131:6803 socket error on read

>> >

>> > i couldn't kill my cp neither normally reboot my server. So I had to

>> > reset

>> > it.

>> >

>> > I tried to copy my other file "win2012" also stored in the ceph cluster

>> > and

>> > get the same issue and now I can't read anything from it nor start my VM

>> > again

>> >

>> > [root@server1 ~]# ceph status

>> >   cluster 50dc0404-c081-4c43-ac3f-872ba5494bd7

>> >    health HEALTH_OK

>> >    monmap e4: 3 mons at

>> >

>> > {server1=192.168.0.130:6789/0,server2=192.168.0.131:6789/0,server3=192.168.0.132:6789/0},

>> > election epoch 120, quorum 0,1,2 server1,server2,server3

>> >    osdmap e275: 2 osds: 2 up, 2 in

>> >     pgmap v1508209: 576 pgs: 576 active+clean; 108 GB data, 214 GB used,

>> > 785

>> > GB / 999 GB avail

>> >    mdsmap e181: 1/1/1 up {0=server2=up:active}, 1 up:standby

>> >

>> > I mount the FS with fstab like this:

>> > 192.168.0.131:6789,192.168.0.130:6789:/live_migration /var/lib/instances

>> > ceph name=live_migration,secret=mysecret==,noatime 0 2

>> >

>> > I get this log in ceph-osd.0.log as spammy as "socket error on read"

>> > error i

>> > get in syslog

>> > 2013-10-05 23:07:23.586807 7f24731cc700  0 -- 192.168.0.130:6801/19182

>> > >>

>> > 192.168.0.130:0/4212596483 pipe(0x128d8500 sd=115 :6801 s=0 pgs=0 cs=0

>> > l=0

>> > c=0x14ac09a0).accept peer addr is rea

>> > lly 192.168.0.130:0/4212596483 (socket is 192.168.0.130:35078/0)

>> >

>> > other infos:

>> > df -h

>> > /dev/mapper/server1--vg-ceph                         500G  108G  393G

>> > 22%

>> > /opt/data/ceph

>> > 192.168.0.131:6789,192.168.0.130:6789:/live_migration 1000G  215G  786G

>> > 22%

>> > /var/lib/instances

>> > ...

>> >

>> > mount

>> > /dev/mapper/server1--vg-ceph on /opt/data/ceph type xfs (rw,noatime)

>> > 192.168.0.131:6789,192.168.0.130:6789:/live_migration on

>> > /var/lib/instances

>> > type ceph (name=live_migration,key=client.live_migration)

>> > ...

>> >

>> >

>> > How can I recover from this ?

>> >

>> > Thank you,

>> > --

>> > Jean-Sébastien Frerot

>> > jsfrerot@xxxxxxxxxxxxxxxx

>> >

>> > _______________________________________________

>> > ceph-users mailing list

>> > ceph-users@xxxxxxxxxxxxxx

>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >

>

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com