Re: un-even data filled on OSDs

Blair Bethwaite <blair.bethwaite@xxxxxxxxx> · Fri, 10 Jun 2016 12:10:56 +1000

Hi Swami,

That's a known issue, which I believe is much improved in Jewel thanks
to a priority queue added somewhere in the OSD op path (I think). If I
were you I'd be planning to get off Firefly and upgrade.

Cheers,

On 10 June 2016 at 12:08, M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote:
> Blair - Thanks for the details. I used to set the low priority for
> recovery during the rebalance/recovery activity.
> Even though I set the recovery_priority as 5 (instead of 1) and
> client-op_priority set as 63, some of my customers complained that
> their VMs are not reachable for a few mins/secs during the reblancing
> task. Not sure, these low priority configurations are doing the job as
> its.
>
> Thanks
> Swami
>
> On Thu, Jun 9, 2016 at 5:50 PM, Blair Bethwaite
> <blair.bethwaite@xxxxxxxxx> wrote:
>> Swami,
>>
>> Run it with the help option for more context:
>> "./crush-reweight-by-utilization.py --help". In your example below
>> it's reporting to you what changes it would make to your OSD reweight
>> values based on the default option settings (because you didn't
>> specify any options). To make the script actually apply those weight
>> changes you need the "-d -r" or "--doit --really" flags.
>>
>> If you want to get an idea of the impact that the weight changes will
>> have before actually starting to move data then I suggest setting
>> norecover and nobackfill (ceph osd set ...) on your cluster before
>> making the weight changes, you can then examine "ceph -s" output
>> (looking at "objects misplaced" to determine the scale of recovery
>> required. Unset the flags once ready to start or back-out the reweight
>> settings if you change your mind. You'll also want to lower these
>> recovery and backfill tunables to reduce impact to client I/O (and if
>> possible do not do this reweight change during peak I/O hours):
>> ceph tell osd.* injectargs '--osd-max-backfills 1'
>> ceph tell osd.* injectargs '--osd-max-recovery-threads 1'
>> ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
>> ceph tell osd.* injectargs '--osd-client-op-priority 63'
>> ceph tell osd.* injectargs '--osd-recovery-max-active 1'
>>
>> Cheers,
>>
>> On 9 June 2016 at 20:20, M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote:
>>> Hi Blari,
>>> I ran the script and results are below:
>>> ==
>>> ./crush-reweight-by-utilization.py
>>> average_util: 0.587024, overload_util: 0.704429, underload_util: 0.587024.
>>> reweighted:
>>> 43 (0.852690 >= 0.704429) [1.000000 -> 0.950000]
>>> 238 (0.845154 >= 0.704429) [1.000000 -> 0.950000]
>>> 104 (0.827908 >= 0.704429) [1.000000 -> 0.950000]
>>> 173 (0.817063 >= 0.704429) [1.000000 -> 0.950000]
>>> ==
>>>
>>> is the above scripts says to reweight 43 -> 0.95?
>>>
>>> Thanks
>>> Swami
>>>
>>> On Wed, Jun 8, 2016 at 10:34 AM, M Ranga Swami Reddy
>>> <swamireddy@xxxxxxxxx> wrote:
>>>> Blair - Thanks for the script...Btw, is this script has option for dry run?
>>>>
>>>> Thanks
>>>> Swami
>>>>
>>>> On Wed, Jun 8, 2016 at 6:35 AM, Blair Bethwaite
>>>> <blair.bethwaite@xxxxxxxxx> wrote:
>>>>> Swami,
>>>>>
>>>>> Try https://github.com/cernceph/ceph-scripts/blob/master/tools/crush-reweight-by-utilization.py,
>>>>> that'll work with Firefly and allow you to only tune down weight of a
>>>>> specific number of overfull OSDs.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> On 7 June 2016 at 23:11, M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote:
>>>>>> OK, understood...
>>>>>> To fix the nearfull warn, I am reducing the weight of a specific OSD,
>>>>>> which filled >85%..
>>>>>> Is this work-around advisable?
>>>>>>
>>>>>> Thanks
>>>>>> Swami
>>>>>>
>>>>>> On Tue, Jun 7, 2016 at 6:37 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>>>>>>> On Tue, 7 Jun 2016, M Ranga Swami Reddy wrote:
>>>>>>>> Hi Sage,
>>>>>>>> >Jewel and the latest hammer point release have an improved
>>>>>>>> >reweight-by-utilization (ceph osd test-reweight-by-utilization ... to dry
>>>>>>>> > run) to correct this.
>>>>>>>>
>>>>>>>> Thank you....But not planning to upgrade the cluster soon.
>>>>>>>> So, in this case - are there any tunable options will help? like
>>>>>>>> "crush tunable optimal" or so?
>>>>>>>> OR any other configuration options change will help?
>>>>>>>
>>>>>>> Firefly also has reweight-by-utilization... it's just a bit less friendly
>>>>>>> than the newer versions.  CRUSH tunables don't generally help here unless
>>>>>>> you have lots of OSDs that are down+out.
>>>>>>>
>>>>>>> Note that firefly is no longer supported.
>>>>>>>
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Swami
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 7, 2016 at 6:00 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>>>>>>>> > On Tue, 7 Jun 2016, M Ranga Swami Reddy wrote:
>>>>>>>> >> Hello,
>>>>>>>> >> I have aorund 100 OSDs in my ceph cluster. In this a few OSDs filled
>>>>>>>> >> with >85% of data and few OSDs filled with ~60%-70% of data.
>>>>>>>> >>
>>>>>>>> >> Any reason why the unevenly OSDs filling happned? do I need to any
>>>>>>>> >> tweaks on configuration to fix the above? Please advise.
>>>>>>>> >>
>>>>>>>> >> PS: Ceph version is - 0.80.7
>>>>>>>> >
>>>>>>>> > Jewel and the latest hammer point release have an improved
>>>>>>>> > reweight-by-utilization (ceph osd test-reweight-by-utilization ... to dry
>>>>>>>> > run) to correct this.
>>>>>>>> >
>>>>>>>> > sage
>>>>>>>> >
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Cheers,
>>>>> ~Blairo
>>
>>
>>
>> --
>> Cheers,
>> ~Blairo

-- 
Cheers,
~Blairo
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html