On Wed, 14 Nov 2012, Aleksey Samarin wrote: > Hello! > > I have the same problem. After switching off the second node, the > cluster hangs, there is some solution? > > All the best, Alex! I suspect this is min_size; the latest master has a few changes and also will print it out so you can tell what is going on. min_size is the minimum number of replicas before the OSDs will go active (handle reads/writes). Setting it to 1 gets you old behavior, while increasing it protects you from cases where writes to a single replica that then fails will force the admin to make a difficult decision about losing data. You can adjust with ceph osd pool set <pool name> min_size <value> sage > > 2012/11/12 Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>: > > Am 12.11.2012 16:11, schrieb Sage Weil: > > > >> On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: > >>> > >>> Hello list, > >>> > >>> i was checking what happens if i reboot a ceph node. > >>> > >>> Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is > >>> possible. > >> > >> > >> If you are using the current master, the new 'min_size' may be biting you; > >> ceph osd dump | grep ^pool and see if you see min_size for your pools. > >> You can change that back to the norma behavior with > > > > > > No i don't see any min size: > > > > # ceph osd dump | grep ^pool > > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1344 > > pgp_num 1344 last_change 1 owner 0 crash_replay_interval 45 > > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num > > 1344 pgp_num 1344 last_change 1 owner 0 > > pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1344 > > pgp_num 1344 last_change 1 owner 0 > > pool 3 'kvmpool1' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num > > 3000 pgp_num 3000 last_change 958 owner 0 > > > > > >> ceph osd pool set <poolname> min_size 1 > > > > Yes this helps! But min_size is still not shown in ceph osd dump. Also when > > i reboot a node it takes up to 10s-20s until all osds from this node are set > > to failed and the I/O starts again. Should i issue an ceph osd out command > > before? > > > > But i had already this set for all my rules in my crushmap > > min_size 1 > > max_size 2 > > > > in my crushmap for each rule. > > > > > > Stefan > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html