Hello all, First of all: Thanks for this wonderfull piece of software. This way we are able to truly have redundant storage. First the situation: We have currently running 1 Storage server (running 8 OSD's), this is the only storage server we currently have and will be expanded by one more in the near future. This server has also 1 MON running. We also have 2 VM servers (running xen). Each of these servers have one MON server running. We are using RBD for the block devices which get attached by xen to the VPS. After that the VPS can startup and do its thing. Now the logs of the VM server are showing the following: [188542.746229] libceph: osd1 10.150.150.10:6804 socket closed (con state OPEN) [188542.747963] rbd: obj_request ffff88010ab8c2c0 was already done [188542.747963] [188547.758064] libceph: osd1 10.150.150.10:6804 socket closed (con state OPEN) [188547.758940] libceph: osd1 10.150.150.10:6804 socket error on read [188548.671066] rbd: obj_request ffff88010ab8c2c0 was already done [188548.671066] Looking at the osd log (in this case osd1): 2013-06-14 13:31:24.038029 7f7ba8d89700 0 bad crc in data 1404955113 != exp 3510295870 2013-06-14 13:31:24.038599 7f7ba8d89700 0 -- 10.150.150.10:6804/5205 >> 10.150.150.101:0/1542084666 pipe(0x6021400 sd=32 :6804 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.150.150.101:0/1542084666 (socket is 10.150.150.101:41501/0) 2013-06-14 13:31:24.038714 7f7ba8d89700 0 auth: could not find secret_id=0 2013-06-14 13:31:24.038725 7f7ba8d89700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=0 2013-06-14 13:31:24.038731 7f7ba8d89700 0 -- 10.150.150.10:6804/5205 >> 10.150.150.101:0/1542084666 pipe(0x6021400 sd=32 :6804 s=0 pgs=0 cs=0 l=1).accept: got bad authorizer 2013-06-14 13:31:29.049213 7f7ba8d89700 0 bad crc in data 1760589740 != exp 3270696062 2013-06-14 13:31:29.049976 7f7ba8d89700 0 -- 10.150.150.10:6804/5205 >> 10.150.150.101:0/1542084666 pipe(0x799b180 sd=32 :6804 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.150.150.101:0/1542084666 (socket is 10.150.150.101:41502/0) 2013-06-14 13:31:29.050113 7f7ba8d89700 0 auth: could not find secret_id=0 2013-06-14 13:31:29.050124 7f7ba8d89700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=0 2013-06-14 13:31:29.050129 7f7ba8d89700 0 -- 10.150.150.10:6804/5205 >> 10.150.150.101:0/1542084666 pipe(0x799b180 sd=32 :6804 s=0 pgs=0 cs=0 l=1).accept: got bad authorizer 2013-06-14 13:31:29.961212 7f7ba8d89700 0 -- 10.150.150.10:6804/5205 >> 10.150.150.101:0/1542084666 pipe(0x799af00 sd=32 :6804 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.150.150.101:0/1542084666 (socket is 10.150.150.101:41503/0) This happens to all the OSD servers running on the storage machine, so it is not only this one causing trouble. Checking the health of ceph: root@vms02:~# ceph status health HEALTH_OK monmap e1: 3 mons at {a=10.150.150.10:6789/0,b=10.150.150.102:6789/0,c=10.150.150.101:6789/0}, election epoch 24, quorum 0,1,2 a,b,c osdmap e95: 8 osds: 8 up, 8 in pgmap v39247: 592 pgs: 592 active+clean; 19870 MB data, 119 GB used, 14773 GB / 14892 GB avail; 11448B/s wr, 0op/s mdsmap e1: 0/0/1 up Obviously something is causing the warnings / errors but I don't know what. Can someone help me in understanding what's going on here? All our servers are running Debian Wheezy with all updates applied. We're using the ceph sources from ceph.com to install ceph. The servers have all the same kernel and software versions installed: Linux kernel version: Linux vms02 3.9-1-amd64 #1 SMP Debian 3.9.4-1 x86_64 GNU/Linux Version of ceph: 0.61.3-1~bpo70+1 Cephx enabled. Let me know if you need more information. Regards, Matthijs Möhlmann _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com