FYI and history Rule: # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step choose firstn 0 type room step choose firstn 0 type rack step choose firstn 0 type host step chooseleaf firstn 0 type osd step emit } And after reset node, i can't find any usable info. Cluster works fine and data just rebalanced by osd disks. syslog: May 9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Reloading. May 9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Starting Network Time Synchronization... May 9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Started Network Time Synchronization. May 9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Reloading. May 9 19:30:02 srv-lab-ceph-node-01 CRON[1731]: (CRON) info (No MTA installed, discarding output) May 11 11:54:57 srv-lab-ceph-node-01 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="689" x-info="http://www.rsyslog.com"] start May 11 11:54:56 srv-lab-ceph-node-01 rsyslogd: rsyslogd's groupid changed to 103 May 11 11:54:57 srv-lab-ceph-node-01 rsyslogd: rsyslogd's userid changed to 100 Sorry for noise, guys. Georgios, in any way, thanks for helping. 2015-05-10 12:44 GMT+03:00 Georgios Dimitrakakis <giorgis@xxxxxxxxxxxx>: > Timofey, > > may be your best chance is to connect directly at the server and see what is > going on. > Then you can try debug why the problem occurred. If you don't want to wait > until tomorrow > you may try to see what is going on using the server's direct remote console > access. > The majority of the servers provide you with that just with a different name > each (DELL calls it iDRAC, Fujitsu iRMC, etc.) so if you have it up and > running you can use that. > > I think this should be your starting point and you can take it on from > there. > > I am sorry I cannot help you further with the Crush rules and the reason why > it crashed since I am far from being an expert in the field :-( > > Regards, > > George > > >> Georgios, oh, sorry for my poor english _-_, may be I poor expressed >> what i want =] >> >> i know how to write simple Crush rule and how use it, i want several >> things things: >> 1. Understand why, after inject bad map, my test node make offline. >> This is unexpected. >> 2. May be somebody can explain what and why happens with this map. >> 3. This is not a problem to write several crushmap or/and switch it >> while cluster running. >> But, in production, we have several nfs servers, i think about moving >> it to ceph, but i can't disable more then 1 server for maintenance >> simultaneously. I want avoid data disaster while setup and moving data >> to ceph, case like "Use local data replication, if only one node >> exist" looks usable as temporally solution, while i not add second >> node _-_. >> 4. May be some one also have test cluster and can test that happen >> with clients, if crushmap like it was injected. >> >> 2015-05-10 8:23 GMT+03:00 Georgios Dimitrakakis <giorgis@xxxxxxxxxxxx>: >>> >>> Hi Timofey, >>> >>> assuming that you have more than one OSD hosts and that the replicator >>> factor is equal (or less) to the number of the hosts why don't you just >>> change the crushmap to host replication? >>> >>> You just need to change the default CRUSHmap rule from >>> >>> step chooseleaf firstn 0 type osd >>> >>> to >>> >>> step chooseleaf firstn 0 type host >>> >>> I believe that this is the easiest way to do have replication across OSD >>> nodes unless you have a much more "sophisticated" setup. >>> >>> Regards, >>> >>> George >>> >>> >>> >>>> Hi list, >>>> i had experiments with crush maps, and I've try to get raid1 like >>>> behaviour (if cluster have 1 working osd node, duplicate data across >>>> local disk, for avoiding data lose in case local disk failure and >>>> allow client working, because this is not a degraded state) >>>> ( >>>> in best case, i want dynamic rule, like: >>>> if has only one host -> spread data over local disks; >>>> else if host count > 1 -> spread over hosts (rack o something else); >>>> ) >>>> >>>> i write rule, like below: >>>> >>>> rule test { >>>> ruleset 0 >>>> type replicated >>>> min_size 0 >>>> max_size 10 >>>> step take default >>>> step choose firstn 0 type host >>>> step chooseleaf firstn 0 type osd >>>> step emit >>>> } >>>> >>>> I've inject it in cluster and client node, now looks like have get >>>> kernel panic, I've lost my connection with it. No ssh, no ping, this >>>> is remote node and i can't see what happens until Monday. >>>> Yes, it looks like I've shoot in my foot. >>>> This is just a test setup and cluster destruction, not a problem, but >>>> i think, what broken rules, must not crush something else and in worst >>>> case, must be just ignored by cluster/crushtool compiler. >>>> >>>> May be someone can explain, how this rule can crush system? May be >>>> this is a crazy mistake somewhere? >>> >>> >>> >>> -- >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Have a nice day, Timofey. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com