I have two questions. My newly created cluster with xfs on all osd, ubuntu precise, kernel 3.2.0-23-generic. Ceph 0.47.2-1precise pool 0 'data' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1228 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 3 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1226 owner 0 pool 2 'rbd' rep size 3 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1232 owner 0 pool 3 '.rgw' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3878 owner 18446744073709551615 1. After i stop all daemons on 1 machine in my 3 node cluster with 3 replicas, rbd image operations on vm, staling. DD on this device in VM freezing, and after ceph start on this machine everything goes online. Is there any problem with my config ?? in this situation ceph should go from another copies with reads, and writes into another osd in replica chain, yes ?? Another test iozone on device, and it's stop after daemons stop on 1 machine, and after osd up, iozone go forward, how can i tune this to work without freeze ?? 2012-06-11 21:38:49.583133 pg v88173: 200 pgs: 60 active+clean, 1 stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%) 2012-06-11 21:38:50.582257 pg v88174: 200 pgs: 60 active+clean, 1 stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%) ..... 2012-06-11 21:39:49.991893 pg v88197: 200 pgs: 60 active+clean, 1 stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%) 2012-06-11 21:39:50.992755 pg v88198: 200 pgs: 60 active+clean, 1 stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%) 2012-06-11 21:39:51.993533 pg v88199: 200 pgs: 60 active+clean, 1 stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%) 2012-06-11 21:39:52.994397 pg v88200: 200 pgs: 60 active+clean, 1 stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%) After boot all osd on stoped machine: 2012-06-11 21:40:37.826619 osd e4162: 72 osds: 53 up, 72 in 2012-06-11 21:40:37.825706 mon.0 10.177.66.4:6790/0 348 : [INF] osd.24 10.177.66.6:6800/21597 boot 2012-06-11 21:40:38.825297 pg v88202: 200 pgs: 54 active+clean, 7 stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%) 2012-06-11 21:40:38.826517 osd e4163: 72 osds: 54 up, 72 in 2012-06-11 21:40:38.825250 mon.0 10.177.66.4:6790/0 349 : [INF] osd.25 10.177.66.6:6803/21712 boot 2012-06-11 21:40:38.825655 mon.0 10.177.66.4:6790/0 350 : [INF] osd.28 10.177.66.6:6812/26210 boot 2012-06-11 21:40:38.825907 mon.0 10.177.66.4:6790/0 351 : [INF] osd.29 10.177.66.6:6815/26327 boot 2012-06-11 21:40:39.826738 pg v88203: 200 pgs: 56 active+clean, 4 stale+active+clean, 3 peering, 137 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 76921/254952 degraded (30.171%) 2012-06-11 21:40:39.830098 osd e4164: 72 osds: 59 up, 72 in 2012-06-11 21:40:39.826570 mon.0 10.177.66.4:6790/0 352 : [INF] osd.26 10.177.66.6:6806/21835 boot 2012-06-11 21:40:39.826961 mon.0 10.177.66.4:6790/0 353 : [INF] osd.27 10.177.66.6:6809/21953 boot 2012-06-11 21:40:39.828147 mon.0 10.177.66.4:6790/0 354 : [INF] osd.30 10.177.66.6:6818/26511 boot 2012-06-11 21:40:39.828418 mon.0 10.177.66.4:6790/0 355 : [INF] osd.31 10.177.66.6:6821/26583 boot 2012-06-11 21:40:39.828935 mon.0 10.177.66.4:6790/0 356 : [INF] osd.33 10.177.66.6:6827/26859 boot 2012-06-11 21:40:39.829274 mon.0 10.177.66.4:6790/0 357 : [INF] osd.34 10.177.66.6:6830/26979 boot 2012-06-11 21:40:40.827935 pg v88204: 200 pgs: 56 active+clean, 4 stale+active+clean, 3 peering, 137 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 76921/254952 degraded (30.171%) 2012-06-11 21:40:40.830059 osd e4165: 72 osds: 62 up, 72 in 2012-06-11 21:40:40.827798 mon.0 10.177.66.4:6790/0 358 : [INF] osd.32 10.177.66.6:6824/26701 boot 2012-06-11 21:40:40.829043 mon.0 10.177.66.4:6790/0 359 : [INF] osd.35 10.177.66.6:6833/27165 boot 2012-06-11 21:40:40.829316 mon.0 10.177.66.4:6790/0 360 : [INF] osd.36 10.177.66.6:6836/27280 boot 2012-06-11 21:40:40.829602 mon.0 10.177.66.4:6790/0 361 : [INF] osd.37 10.177.66.6:6839/27397 boot 2012-06-11 21:40:41.828776 pg v88205: 200 pgs: 56 active+clean, 4 stale+active+clean, 3 peering, 137 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 76921/254952 degraded (30.171%) 2012-06-11 21:40:41.831823 osd e4166: 72 osds: 68 up, 72 in 2012-06-11 21:40:41.828713 mon.0 10.177.66.4:6790/0 362 : [INF] osd.38 10.177.66.6:6842/27513 boot 2012-06-11 21:40:41.829440 mon.0 10.177.66.4:6790/0 363 : [INF] osd.39 10.177.66.6:6845/27628 boot 2012-06-11 21:40:41.830226 mon.0 10.177.66.4:6790/0 364 : [INF] osd.40 10.177.66.6:6848/27835 boot 2012-06-11 21:40:41.830531 mon.0 10.177.66.4:6790/0 365 : [INF] osd.41 10.177.66.6:6851/27950 boot 2012-06-11 21:40:41.830778 mon.0 10.177.66.4:6790/0 366 : [INF] osd.42 10.177.66.6:6854/28065 boot 2012-06-11 21:40:41.831249 mon.0 10.177.66.4:6790/0 367 : [INF] osd.43 10.177.66.6:6857/28181 boot 2012-06-11 21:40:42.830440 pg v88206: 200 pgs: 57 active+clean, 4 stale+active+clean, 7 peering, 132 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 75543/254952 degraded (29.630%) 2012-06-11 21:40:42.833294 osd e4167: 72 osds: 72 up, 72 in 2012-06-11 21:40:42.831046 mon.0 10.177.66.4:6790/0 368 : [INF] osd.44 10.177.66.6:6860/28373 boot 2012-06-11 21:40:42.832004 mon.0 10.177.66.4:6790/0 369 : [INF] osd.45 10.177.66.6:6863/28489 boot 2012-06-11 21:40:42.832314 mon.0 10.177.66.4:6790/0 370 : [INF] osd.46 10.177.66.6:6866/28607 boot 2012-06-11 21:40:42.832545 mon.0 10.177.66.4:6790/0 371 : [INF] osd.47 10.177.66.6:6869/28731 boot 2012-06-11 21:40:43.830481 pg v88207: 200 pgs: 64 active+clean, 4 stale+active+clean, 7 peering, 125 active+degraded; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 72874/254952 degraded (28.583%) 2012-06-11 21:40:43.831113 osd e4168: 72 osds: 72 up, 72 in 2012-06-11 21:40:44.832521 pg v88208: 200 pgs: 79 active+clean, 1 stale+active+clean, 4 peering, 113 active+degraded, 3 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 66185/254952 degraded (25.960%) 2012-06-11 21:40:45.834077 pg v88209: 200 pgs: 104 active+clean, 1 stale+active+clean, 4 peering, 85 active+degraded, 6 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 50399/254952 degraded (19.768%) 2012-06-11 21:40:46.835367 pg v88210: 200 pgs: 125 active+clean, 1 stale+active+clean, 4 peering, 59 active+degraded, 11 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 38563/254952 degraded (15.126%) 2012-06-11 21:40:47.836516 pg v88211: 200 pgs: 158 active+clean, 1 stale+active+clean, 26 active+degraded, 15 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 18542/254952 degraded (7.273%) 2012-06-11 21:40:48.853560 pg v88212: 200 pgs: 184 active+clean, 16 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 1/254952 degraded (0.000%) 2012-06-11 21:40:49.868514 pg v88213: 200 pgs: 184 active+clean, 16 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 1/254952 degraded (0.000%) 2012-06-11 21:40:50.858244 pg v88214: 200 pgs: 184 active+clean, 16 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail; 1/254952 degraded (0.000%) 2012-06-11 21:40:51.845622 pg v88215: 200 pgs: 184 active+clean, 16 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:40:52.857823 pg v88216: 200 pgs: 184 active+clean, 16 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:40:53.858281 pg v88217: 200 pgs: 184 active+clean, 16 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:40:54.855602 pg v88218: 200 pgs: 184 active+clean, 16 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:40:55.857241 pg v88219: 200 pgs: 184 active+clean, 16 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:40:56.857631 pg v88220: 200 pgs: 184 active+clean, 16 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:40:57.858987 pg v88221: 200 pgs: 185 active+clean, 15 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:40:58.880252 pg v88222: 200 pgs: 185 active+clean, 15 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:40:59.861910 pg v88223: 200 pgs: 188 active+clean, 12 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:41:00.902582 pg v88224: 200 pgs: 191 active+clean, 9 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:41:01.907767 pg v88225: 200 pgs: 196 active+clean, 4 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:41:02.876377 pg v88226: 200 pgs: 199 active+clean, 1 active+recovering; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail 2012-06-11 21:41:03.876929 pg v88227: 200 pgs: 200 active+clean; 783 GB data, 1928 GB used, 18111 GB / 20040 GB avail <disk type="network" device="disk"> <driver name="qemu" type="raw"/> <source protocol="rbd" name="rbd/foo4"> </source> <target dev="vdf" bus="virtio"/> </disk> 2. When i use rbd_cache=1, or true in my xml, for libvirt i get: <disk type="network" device="disk"> <driver name="qemu" type="raw"/> <source protocol="rbd" name="rbd/foo5:rbd_cache=1"> </source> <target dev="vdf" bus="virtio"/> </disk> libvirtd.log 2012-06-11 18:50:36.992+0000: 1751: error : qemuMonitorTextAddDrive:2820 : operation failed: open disk image file failed Libvirt version 0.9.8-2ubuntu17 with some additional patch set before ceph 0.46 version appears. Qemu-kvm 1.0+noroms-0ubuntu13. Do i need any other patch for libvirt ?? Without rbd_cache attaching is ok. -- ----- Pozdrawiam Sławek "sZiBis" Skowron
; global [global] ; enable secure authentication auth supported = cephx keyring = /etc/ceph/$cluster.keyring rgw_cache_enabled = false ;rgw cache enabled rgw_cache_lru_size = 131072 ;num of entries in rgw cache rgw_thread_pool_size = 4096 rgw print continue = false ;enable if 100-Continue works rgw_enable_ops_log = true ;enable logging every rgw operation rgw socket path = /var/run/radosgw.sock debug rgw = 1 rbd cache = true rbd cache max dirty = 0 admin_socket = /var/run/ceph/$cluster-$name.asok ; radosgw client list [client.radosgw.obs-10-177-66-4] host = obs-10-177-66-4 keyring = /etc/ceph/client.radosgw.obs-10-177-66-4.bin log file = /var/log/radosgw/$name.log debug ms = 1 [client.radosgw.obs-10-177-66-6] host = obs-10-177-66-6 keyring = /etc/ceph/client.radosgw.obs-10-177-66-6.bin log file = /var/log/radosgw/$name.log debug ms = 1 [client.radosgw.obs-10-177-66-8] host = obs-10-177-66-8 keyring = /etc/ceph/client.radosgw.obs-10-177-66-8.bin log file = /var/log/radosgw/$name.log debug ms = 1 ; monitors ; You need at least one. You need at least three if you want to ; tolerate any node failures. Always create an odd number. [mon] mon data = /vol0/data/mon.$id ; some minimal logging (just message traffic) to aid debugging debug ms = 0 ; see message traffic debug mon = 0 ; monitor debug paxos = 0 ; monitor replication debug auth = 0 ; mon allowed clock drift = 2 [mon.0] host = obs-10-177-66-4 mon addr = 10.177.66.4:6790 ; osd ; You need at least one. Two if you want data to be replicated. ; Define as many as you like. [osd] ; This is where the btrfs volume will be mounted. osd data = /vol0/data/osd.$id ; Ideally, make this a separate disk or partition. A few GB ; is usually enough; more if you have fast disks. You can use ; a file under the osd data dir if need be ; (e.g. /data/osd$id/journal), but it will be slower than a ; separate disk or partition. osd journal = /vol0/data/osd.$id/journal ; If the OSD journal is a file, you need to specify the size. This is specified in MB. keyring = /vol0/data/osd.$id/keyring osd journal size = 1024 ; filestore_xattr_use_omap = 0 filestore journal writeahead = 1 osd heartbeat grace = 8 debug ms = 0 ; message traffic debug osd = 1 debug filestore = 1 ; local object storage debug journal = 0 ; local journaling debug monc = 0 debug rados = 0 [osd.0] host = obs-10-177-66-4 [osd.1] host = obs-10-177-66-4 [osd.10] host = obs-10-177-66-4 [osd.11] host = obs-10-177-66-4 [osd.12] host = obs-10-177-66-4 [osd.13] host = obs-10-177-66-4 [osd.14] host = obs-10-177-66-4 [osd.15] host = obs-10-177-66-4 [osd.16] host = obs-10-177-66-4 [osd.17] host = obs-10-177-66-4 [osd.18] host = obs-10-177-66-4 [osd.19] host = obs-10-177-66-4 [osd.2] host = obs-10-177-66-4 [osd.20] host = obs-10-177-66-4 [osd.21] host = obs-10-177-66-4 [osd.22] host = obs-10-177-66-4 [osd.23] host = obs-10-177-66-4 [osd.24] host = obs-10-177-66-6 [osd.25] host = obs-10-177-66-6 [osd.26] host = obs-10-177-66-6 [osd.27] host = obs-10-177-66-6 [osd.28] host = obs-10-177-66-6 [osd.29] host = obs-10-177-66-6 [osd.3] host = obs-10-177-66-4 [osd.30] host = obs-10-177-66-6 [osd.31] host = obs-10-177-66-6 [osd.32] host = obs-10-177-66-6 [osd.33] host = obs-10-177-66-6 [osd.34] host = obs-10-177-66-6 [osd.35] host = obs-10-177-66-6 [osd.36] host = obs-10-177-66-6 [osd.37] host = obs-10-177-66-6 [osd.38] host = obs-10-177-66-6 [osd.39] host = obs-10-177-66-6 [osd.4] host = obs-10-177-66-4 [osd.40] host = obs-10-177-66-6 [osd.41] host = obs-10-177-66-6 [osd.42] host = obs-10-177-66-6 [osd.43] host = obs-10-177-66-6 [osd.44] host = obs-10-177-66-6 [osd.45] host = obs-10-177-66-6 [osd.46] host = obs-10-177-66-6 [osd.47] host = obs-10-177-66-6 [osd.48] host = obs-10-177-66-8 [osd.49] host = obs-10-177-66-8 [osd.5] host = obs-10-177-66-4 [osd.50] host = obs-10-177-66-8 [osd.51] host = obs-10-177-66-8 [osd.52] host = obs-10-177-66-8 [osd.53] host = obs-10-177-66-8 [osd.54] host = obs-10-177-66-8 [osd.55] host = obs-10-177-66-8 [osd.56] host = obs-10-177-66-8 [osd.57] host = obs-10-177-66-8 [osd.58] host = obs-10-177-66-8 [osd.59] host = obs-10-177-66-8 [osd.6] host = obs-10-177-66-4 [osd.60] host = obs-10-177-66-8 [osd.61] host = obs-10-177-66-8 [osd.62] host = obs-10-177-66-8 [osd.63] host = obs-10-177-66-8 [osd.64] host = obs-10-177-66-8 [osd.65] host = obs-10-177-66-8 [osd.66] host = obs-10-177-66-8 [osd.67] host = obs-10-177-66-8 [osd.68] host = obs-10-177-66-8 [osd.69] host = obs-10-177-66-8 [osd.7] host = obs-10-177-66-4 [osd.70] host = obs-10-177-66-8 [osd.71] host = obs-10-177-66-8 [osd.8] host = obs-10-177-66-4 [osd.9] host = obs-10-177-66-4