Re: Interesting problem: 2 pgs stuck in EC pool with missing OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you Loic & Greg. We followed the troubleshooting directions and ran the crushtool in test mode to verify that CRUSH was giving up too soon, and then confirmed that changing the set_choose_tries value to 100 would resolve the issue (it did). 
We then implemented the change in the cluster, while also changing the tunable ‘choose_total_tries’ to 150 from 50 (without that bump it seemed that we could still get a bad mapping).  
It only took a few minutes for the remaining 2 PG’s to successfully re-distribute their data, and we have finally reached HEALTH_OK.   Thanks!
-- 
Paul Evans


On Apr 8, 2015, at 10:36 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote:

Hi Paul,

Contrary to what the documentation states at

http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon

the crush ruleset can be modified (an update at https://github.com/ceph/ceph/pull/4306 will fix that). Placement groups will move around, but that's to be expected.

Cheers

On 06/04/2015 20:40, Paul Evans wrote:
Thanks for the insights, Greg.  It would be great if the CRUSH rule for an EC pool can be dynamically changed…but if that’s not the case, the troubleshooting doc also offers up the idea of adding more OSDs, and we have another 8 OSDs (one from each node) we can move into the default root.  
However: just to clarify the point of adding OSDs: the current EC profile has a failure domain of ‘host’... will adding more OSDs still improve the odds of CRUSH finding a good mapping within the given timeout period?

BTW, I’m a little concerned about moving all 8 OSDs at once, as we’re skinny on RAM and the EC pools seem to like more RAM that replicated pools do. Considering the RAM issue, is adding 2-4 OSDs at a time the recommendation? (other than adding more RAM).

--
*Paul Evans
*
*
*
This looks like it's just the standard risk of using a pseudo-random
algorithm: you need to "randomly" map 8 pieces into 8 slots. Sometimes
the CRUSH calculation will return the same 7 slots so many times in a
row that it simply fails to get all 8 of them inside of the time
bounds that are currently set.

If you look through the list archives we've discussed this a few
times, especially Loïc in the context of erasure coding. See
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon
for the fix.
But I think that doc is wrong and you can change the CRUSH rule in use
without creating a new pool — right, Loïc?
-Greg


--
Loïc Dachary, Artisan Logiciel Libre


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux