Degraded PGs blocking open()?

Székelyi Szabolcs <szekelyi@xxxxxxx> · Tue, 7 Jun 2011 02:37:34 +0200

Hi all,

I have a three node ceph setup, two nodes playing all three roles (OSD, MDS, 
MON), and one being just a monitor (which happens to be the client I'm using 
the filesystem from).

I want to achieve high availablity by mirroring all data between the OSDs and 
being able to still access everything even if one of them goes down. The 
mirroring works fine, I see the space being consumed on both nodes as I copy 
data on the file system. According to `ceph -s`, all PGs are in active+clean 
state. If I start reading a big file and shut down one of the (OSD+MDS+MON) 
nodes, the file can still be read until the end, that's fine. Moreover, the 
contents read back seem correct when compared to the original file. Very nice. 
But if I start reading the file while one of the nodes is down, it blocks until 
the node comes up again. I can't even kill the reading process with KILL, 
TERM, or INT.

Am I doing something wrong, or was not careful enough reading the docs, or may 
this be a bug? My ceph.conf is attached.

Thanks,
-- 
cc

[global]
auth supported = cephx
keyring = /etc/ceph/keyring.$name

[mds]

[mds.0]
host = iscsigw1

[mds.1]
host = iscsigw2

[osd]
osd data = /srv/ceph/osd.$id

[osd.0]
host = iscsigw1

[osd.1]
host = iscsigw2

[mon]
mon data = /srv/ceph/mon.$id

[mon.0]
host = iscsigw1
mon addr = <node1_ip>:6789

[mon.1]
host = iscsigw2
mon addr = <node2_ip>:6789

[mon.cc]
host = cc
mon addr = <node3_ip>:6789