Blair - Thanks for the details. I used to set the low priority for recovery during the rebalance/recovery activity. Even though I set the recovery_priority as 5 (instead of 1) and client-op_priority set as 63, some of my customers complained that their VMs are not reachable for a few mins/secs during the reblancing task. Not sure, these low priority configurations are doing the job as its. Thanks Swami On Thu, Jun 9, 2016 at 5:50 PM, Blair Bethwaite <blair.bethwaite@xxxxxxxxx> wrote: > Swami, > > Run it with the help option for more context: > "./crush-reweight-by-utilization.py --help". In your example below > it's reporting to you what changes it would make to your OSD reweight > values based on the default option settings (because you didn't > specify any options). To make the script actually apply those weight > changes you need the "-d -r" or "--doit --really" flags. > > If you want to get an idea of the impact that the weight changes will > have before actually starting to move data then I suggest setting > norecover and nobackfill (ceph osd set ...) on your cluster before > making the weight changes, you can then examine "ceph -s" output > (looking at "objects misplaced" to determine the scale of recovery > required. Unset the flags once ready to start or back-out the reweight > settings if you change your mind. You'll also want to lower these > recovery and backfill tunables to reduce impact to client I/O (and if > possible do not do this reweight change during peak I/O hours): > ceph tell osd.* injectargs '--osd-max-backfills 1' > ceph tell osd.* injectargs '--osd-max-recovery-threads 1' > ceph tell osd.* injectargs '--osd-recovery-op-priority 1' > ceph tell osd.* injectargs '--osd-client-op-priority 63' > ceph tell osd.* injectargs '--osd-recovery-max-active 1' > > Cheers, > > On 9 June 2016 at 20:20, M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote: >> Hi Blari, >> I ran the script and results are below: >> == >> ./crush-reweight-by-utilization.py >> average_util: 0.587024, overload_util: 0.704429, underload_util: 0.587024. >> reweighted: >> 43 (0.852690 >= 0.704429) [1.000000 -> 0.950000] >> 238 (0.845154 >= 0.704429) [1.000000 -> 0.950000] >> 104 (0.827908 >= 0.704429) [1.000000 -> 0.950000] >> 173 (0.817063 >= 0.704429) [1.000000 -> 0.950000] >> == >> >> is the above scripts says to reweight 43 -> 0.95? >> >> Thanks >> Swami >> >> On Wed, Jun 8, 2016 at 10:34 AM, M Ranga Swami Reddy >> <swamireddy@xxxxxxxxx> wrote: >>> Blair - Thanks for the script...Btw, is this script has option for dry run? >>> >>> Thanks >>> Swami >>> >>> On Wed, Jun 8, 2016 at 6:35 AM, Blair Bethwaite >>> <blair.bethwaite@xxxxxxxxx> wrote: >>>> Swami, >>>> >>>> Try https://github.com/cernceph/ceph-scripts/blob/master/tools/crush-reweight-by-utilization.py, >>>> that'll work with Firefly and allow you to only tune down weight of a >>>> specific number of overfull OSDs. >>>> >>>> Cheers, >>>> >>>> On 7 June 2016 at 23:11, M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote: >>>>> OK, understood... >>>>> To fix the nearfull warn, I am reducing the weight of a specific OSD, >>>>> which filled >85%.. >>>>> Is this work-around advisable? >>>>> >>>>> Thanks >>>>> Swami >>>>> >>>>> On Tue, Jun 7, 2016 at 6:37 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >>>>>> On Tue, 7 Jun 2016, M Ranga Swami Reddy wrote: >>>>>>> Hi Sage, >>>>>>> >Jewel and the latest hammer point release have an improved >>>>>>> >reweight-by-utilization (ceph osd test-reweight-by-utilization ... to dry >>>>>>> > run) to correct this. >>>>>>> >>>>>>> Thank you....But not planning to upgrade the cluster soon. >>>>>>> So, in this case - are there any tunable options will help? like >>>>>>> "crush tunable optimal" or so? >>>>>>> OR any other configuration options change will help? >>>>>> >>>>>> Firefly also has reweight-by-utilization... it's just a bit less friendly >>>>>> than the newer versions. CRUSH tunables don't generally help here unless >>>>>> you have lots of OSDs that are down+out. >>>>>> >>>>>> Note that firefly is no longer supported. >>>>>> >>>>>> sage >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> Swami >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 7, 2016 at 6:00 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >>>>>>> > On Tue, 7 Jun 2016, M Ranga Swami Reddy wrote: >>>>>>> >> Hello, >>>>>>> >> I have aorund 100 OSDs in my ceph cluster. In this a few OSDs filled >>>>>>> >> with >85% of data and few OSDs filled with ~60%-70% of data. >>>>>>> >> >>>>>>> >> Any reason why the unevenly OSDs filling happned? do I need to any >>>>>>> >> tweaks on configuration to fix the above? Please advise. >>>>>>> >> >>>>>>> >> PS: Ceph version is - 0.80.7 >>>>>>> > >>>>>>> > Jewel and the latest hammer point release have an improved >>>>>>> > reweight-by-utilization (ceph osd test-reweight-by-utilization ... to dry >>>>>>> > run) to correct this. >>>>>>> > >>>>>>> > sage >>>>>>> > >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> -- >>>> Cheers, >>>> ~Blairo > > > > -- > Cheers, > ~Blairo -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html