Thanks Blair. Yes, will plan to upgrade my cluster. Thanks Swami On Fri, Jun 10, 2016 at 7:40 AM, Blair Bethwaite <blair.bethwaite@xxxxxxxxx> wrote: > Hi Swami, > > That's a known issue, which I believe is much improved in Jewel thanks > to a priority queue added somewhere in the OSD op path (I think). If I > were you I'd be planning to get off Firefly and upgrade. > > Cheers, > > On 10 June 2016 at 12:08, M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote: >> Blair - Thanks for the details. I used to set the low priority for >> recovery during the rebalance/recovery activity. >> Even though I set the recovery_priority as 5 (instead of 1) and >> client-op_priority set as 63, some of my customers complained that >> their VMs are not reachable for a few mins/secs during the reblancing >> task. Not sure, these low priority configurations are doing the job as >> its. >> >> Thanks >> Swami >> >> On Thu, Jun 9, 2016 at 5:50 PM, Blair Bethwaite >> <blair.bethwaite@xxxxxxxxx> wrote: >>> Swami, >>> >>> Run it with the help option for more context: >>> "./crush-reweight-by-utilization.py --help". In your example below >>> it's reporting to you what changes it would make to your OSD reweight >>> values based on the default option settings (because you didn't >>> specify any options). To make the script actually apply those weight >>> changes you need the "-d -r" or "--doit --really" flags. >>> >>> If you want to get an idea of the impact that the weight changes will >>> have before actually starting to move data then I suggest setting >>> norecover and nobackfill (ceph osd set ...) on your cluster before >>> making the weight changes, you can then examine "ceph -s" output >>> (looking at "objects misplaced" to determine the scale of recovery >>> required. Unset the flags once ready to start or back-out the reweight >>> settings if you change your mind. You'll also want to lower these >>> recovery and backfill tunables to reduce impact to client I/O (and if >>> possible do not do this reweight change during peak I/O hours): >>> ceph tell osd.* injectargs '--osd-max-backfills 1' >>> ceph tell osd.* injectargs '--osd-max-recovery-threads 1' >>> ceph tell osd.* injectargs '--osd-recovery-op-priority 1' >>> ceph tell osd.* injectargs '--osd-client-op-priority 63' >>> ceph tell osd.* injectargs '--osd-recovery-max-active 1' >>> >>> Cheers, >>> >>> On 9 June 2016 at 20:20, M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote: >>>> Hi Blari, >>>> I ran the script and results are below: >>>> == >>>> ./crush-reweight-by-utilization.py >>>> average_util: 0.587024, overload_util: 0.704429, underload_util: 0.587024. >>>> reweighted: >>>> 43 (0.852690 >= 0.704429) [1.000000 -> 0.950000] >>>> 238 (0.845154 >= 0.704429) [1.000000 -> 0.950000] >>>> 104 (0.827908 >= 0.704429) [1.000000 -> 0.950000] >>>> 173 (0.817063 >= 0.704429) [1.000000 -> 0.950000] >>>> == >>>> >>>> is the above scripts says to reweight 43 -> 0.95? >>>> >>>> Thanks >>>> Swami >>>> >>>> On Wed, Jun 8, 2016 at 10:34 AM, M Ranga Swami Reddy >>>> <swamireddy@xxxxxxxxx> wrote: >>>>> Blair - Thanks for the script...Btw, is this script has option for dry run? >>>>> >>>>> Thanks >>>>> Swami >>>>> >>>>> On Wed, Jun 8, 2016 at 6:35 AM, Blair Bethwaite >>>>> <blair.bethwaite@xxxxxxxxx> wrote: >>>>>> Swami, >>>>>> >>>>>> Try https://github.com/cernceph/ceph-scripts/blob/master/tools/crush-reweight-by-utilization.py, >>>>>> that'll work with Firefly and allow you to only tune down weight of a >>>>>> specific number of overfull OSDs. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> On 7 June 2016 at 23:11, M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote: >>>>>>> OK, understood... >>>>>>> To fix the nearfull warn, I am reducing the weight of a specific OSD, >>>>>>> which filled >85%.. >>>>>>> Is this work-around advisable? >>>>>>> >>>>>>> Thanks >>>>>>> Swami >>>>>>> >>>>>>> On Tue, Jun 7, 2016 at 6:37 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >>>>>>>> On Tue, 7 Jun 2016, M Ranga Swami Reddy wrote: >>>>>>>>> Hi Sage, >>>>>>>>> >Jewel and the latest hammer point release have an improved >>>>>>>>> >reweight-by-utilization (ceph osd test-reweight-by-utilization ... to dry >>>>>>>>> > run) to correct this. >>>>>>>>> >>>>>>>>> Thank you....But not planning to upgrade the cluster soon. >>>>>>>>> So, in this case - are there any tunable options will help? like >>>>>>>>> "crush tunable optimal" or so? >>>>>>>>> OR any other configuration options change will help? >>>>>>>> >>>>>>>> Firefly also has reweight-by-utilization... it's just a bit less friendly >>>>>>>> than the newer versions. CRUSH tunables don't generally help here unless >>>>>>>> you have lots of OSDs that are down+out. >>>>>>>> >>>>>>>> Note that firefly is no longer supported. >>>>>>>> >>>>>>>> sage >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Swami >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 7, 2016 at 6:00 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >>>>>>>>> > On Tue, 7 Jun 2016, M Ranga Swami Reddy wrote: >>>>>>>>> >> Hello, >>>>>>>>> >> I have aorund 100 OSDs in my ceph cluster. In this a few OSDs filled >>>>>>>>> >> with >85% of data and few OSDs filled with ~60%-70% of data. >>>>>>>>> >> >>>>>>>>> >> Any reason why the unevenly OSDs filling happned? do I need to any >>>>>>>>> >> tweaks on configuration to fix the above? Please advise. >>>>>>>>> >> >>>>>>>>> >> PS: Ceph version is - 0.80.7 >>>>>>>>> > >>>>>>>>> > Jewel and the latest hammer point release have an improved >>>>>>>>> > reweight-by-utilization (ceph osd test-reweight-by-utilization ... to dry >>>>>>>>> > run) to correct this. >>>>>>>>> > >>>>>>>>> > sage >>>>>>>>> > >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>> >>>>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Cheers, >>>>>> ~Blairo >>> >>> >>> >>> -- >>> Cheers, >>> ~Blairo > > > > -- > Cheers, > ~Blairo -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html