Re: Crush rule freeze cluster

Timofey Titovets <nefelim4ag@xxxxxxxxx> · Sun, 10 May 2015 11:57:04 +0300

Georgios, oh, sorry for my poor english _-_, may be I poor expressed
what i want =]

i know how to write simple Crush rule and how use it, i want several
things things:
1. Understand why, after inject bad map, my test node make offline.
This is unexpected.
2. May be somebody can explain what and why happens with this map.
3. This is not a problem to write several crushmap or/and switch it
while cluster running.
But, in production, we have several nfs servers, i think about moving
it to ceph, but i can't disable more then 1 server for maintenance
simultaneously. I want avoid data disaster while setup and moving data
to ceph, case like "Use local data replication, if only one node
exist" looks usable as temporally solution, while i not add second
node _-_.
4. May be some one also have test cluster and can test that happen
with clients, if crushmap like it was injected.

2015-05-10 8:23 GMT+03:00 Georgios Dimitrakakis <giorgis@xxxxxxxxxxxx>:
> Hi Timofey,
>
> assuming that you have more than one OSD hosts and that the replicator
> factor is equal (or less) to the number of the hosts why don't you just
> change the crushmap to host replication?
>
> You just need to change the default CRUSHmap rule from
>
> step chooseleaf firstn 0 type osd
>
> to
>
> step chooseleaf firstn 0 type host
>
> I believe that this is the easiest way to do have replication across OSD
> nodes unless you have a much more "sophisticated" setup.
>
> Regards,
>
> George
>
>
>
>> Hi list,
>> i had experiments with crush maps, and I've try to get raid1 like
>> behaviour (if cluster have 1 working osd node, duplicate data across
>> local disk, for avoiding data lose in case local disk failure and
>> allow client working, because this is not a degraded state)
>> (
>>   in best case, i want dynamic rule, like:
>>   if has only one host -> spread data over local disks;
>>   else if host count > 1 -> spread over hosts (rack o something else);
>> )
>>
>> i write rule, like below:
>>
>> rule test {
>>               ruleset 0
>>               type replicated
>>               min_size 0
>>               max_size 10
>>               step take default
>>               step choose firstn 0 type host
>>               step chooseleaf firstn 0 type osd
>>               step emit
>> }
>>
>> I've inject it in cluster and client node, now looks like have get
>> kernel panic, I've lost my connection with it. No ssh, no ping, this
>> is remote node and i can't see what happens until Monday.
>> Yes, it looks like I've shoot in my foot.
>> This is just a test setup and cluster destruction, not a problem, but
>> i think, what broken rules, must not crush something else and in worst
>> case, must be just ignored by cluster/crushtool compiler.
>>
>> May be someone can explain, how this rule can crush system? May be
>> this is a crazy mistake somewhere?
>
>
> --
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Have a nice day,
Timofey.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com