Re: explicitly mapping pgs in OSDMap

"Kamble, Nitin A" <Nitin.Kamble@xxxxxxxxxxxx> · Thu, 2 Mar 2017 17:33:45 +0000

Hi Sage,
  The crush algorithm handles mapping of pgs, and it will even with the
addition of explicit mappings. I presume, finding which pgs belong to
which OSDs will involve addition computation for each additional
explicit mapping. 

What would be penalty of this additional computation? 

For small number of explicit mappings such penalty would be small, 
IMO it can get quite expensive with large number of explicit mappings.
The implementation will need to manage the count of explicit mappings,
by reverting some of the explicit mappings as the distribution changes.
The understanding of additional overhead of the explicit mappings would
had great influence on the implementation.

Nitin

On 3/1/17, 11:44 AM, "ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of Sage Weil" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of sweil@xxxxxxxxxx> wrote:

    There's been a longstanding desire to improve the balance of PGs and data 
    across OSDs to better utilize storage and balance workload.  We had a few 
    ideas about this in a meeting last week and I wrote up a summary/proposal 
    here:

    	http://pad.ceph.com/p/osdmap-explicit-mapping

    The basic idea is to have the ability to explicitly map individual PGs 
    to certain OSDs so that we can move PGs from overfull to underfull 
    devices.  The idea is that the mon or mgr would do this based on some 
    heuristics or policy and should result in a better distribution than teh 
    current osd weight adjustments we make now with reweight-by-utilization.

    The other key property is that one reason why we need as many PGs as we do 
    now is to get a good balance; if we can remap some of them explicitly, we 
    can get a better balance with fewer.  In essense, CRUSH gives an 
    approximate distribution, and then we correct to make it perfect (or close 
    to it).

    The main challenge is less about figuring out when/how to remap PGs to 
    correct balance, but figuring out when to remove those remappings after 
    CRUSH map changes.  Some simple greedy strategies are obvious starting 
    points (e.g., to move PGs off OSD X, first adjust or remove existing remap 
    entries targetting OSD X before adding new ones), but there are a few 
    ways we could structure the remap entries themselves so that they 
    more gracefully disappear after a change.

    For example, a remap entry might move a PG from OSD A to B if it maps to 
    A; if the CRUSH topology changes and the PG no longer maps to A, the entry 
    would be removed or ignored.  There are a few ways to do this in the pad; 
    I'm sure there are other options.

    I put this on the agenda for CDM tonight.  If anyone has any other ideas 
    about this we'd love to hear them!

    sage
    --
    To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
    the body of a message to majordomo@xxxxxxxxxxxxxxx
    More majordomo info at  http://vger.kernel.org/majordomo-info.html

��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f