Re: OSD is near full and slow in accessing storage from client

Brad Hubbard <bhubbard@xxxxxxxxxx> · Mon, 13 Nov 2017 09:43:01 +1000

On Mon, Nov 13, 2017 at 4:57 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
> You cannot reduce the PG count for a pool.  So there isn't anything you can
> really do for this unless you create a new FS with better PG counts and
> migrate your data into it.
>
> The problem with having more PGs than you need is in the memory footprint
> for the osd daemon. There are warning thresholds for having too many PGs per
> osd.  Also in future expansions, if you need to add pools, you might not be
> able to create the pools with the proper amount of PGs due to older pools
> that have way too many PGs.
>
> It would still be nice to see the output from those commands I asked about.
>
> The built-in reweighting scripts might help your data distribution.
> reweight-by-utilization

Please also carefully consider your use of "min_size 1" and understand the risks
associated with it (there are several threads on this list, as well as
ceph-devel, that talk about this setting).

>
>
> On Sun, Nov 12, 2017, 11:41 AM gjprabu <gjprabu@xxxxxxxxxxxx> wrote:
>>
>> Hi David,
>>
>> Thanks for your valuable reply , once complete the backfilling for new osd
>> and will consider by increasing replica value asap. Is it possible to
>> decrease the metadata pg count ?  if the pg count for metadata for value
>> same as data count what kind of issue may occur ?
>>
>> Regards
>> PrabuGJ
>>
>>
>>
>> ---- On Sun, 12 Nov 2017 21:25:05 +0530 David
>> Turner<drakonstein@xxxxxxxxx> wrote ----
>>
>> What's the output of `ceph df` to see if your PG counts are good or not?
>> Like everyone else has said, the space on the original osds can't be
>> expected to free up until the backfill from adding the new osd has finished.
>>
>> You don't have anything in your cluster health to indicate that your
>> cluster will not be able to finish this backfilling operation on its own.
>>
>> You might find this URL helpful in calculating your PG counts.
>> http://ceph.com/pgcalc/  As a side note. It is generally better to keep your
>> PG counts as base 2 numbers (16, 64, 256, etc). When you do not have a base
>> 2 number then some of your PGs will take up twice as much space as others.
>> In your case with 250, you have 244 PGs that are the same size and 6 PGs
>> that are twice the size of those 244 PGs.  Bumping that up to 256 will even
>> things out.
>>
>> Assuming that the metadata pool is for a CephFS volume, you do not need
>> nearly so many PGs for that pool. Also, I would recommend changing at least
>> the metadata pool to 3 replica_size. If we can talk you into 3 replica for
>> everything else, great! But if not, at least do the metadata pool. If you
>> lose an object in the data pool, you just lose that file. If you lose an
>> object in the metadata pool, you might lose access to the entire CephFS
>> volume.
>>
>>
>> On Sun, Nov 12, 2017, 9:39 AM gjprabu <gjprabu@xxxxxxxxxxxx> wrote:
>>
>> Hi Cassiano,
>>
>>        Thanks for your valuable feedback and will wait for some time till
>> new osd sync get complete. Also for by increasing pg count it is the issue
>> will solve? our setup pool size for data and metadata pg number is 250. Is
>> this correct for 7 OSD with 2 replica. Also currently stored data size is
>> 17TB.
>>
>> ceph osd df
>> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS
>> 0 3.29749  1.00000  3376G  2814G  562G 83.35 1.23 165
>> 1 3.26869  1.00000  3347G  1923G 1423G 57.48 0.85 152
>> 2 3.27339  1.00000  3351G  1980G 1371G 59.10 0.88 161
>> 3 3.24089  1.00000  3318G  2131G 1187G 64.23 0.95 168
>> 4 3.24089  1.00000  3318G  2998G  319G 90.36 1.34 176
>> 5 3.32669  1.00000  3406G  2476G  930G 72.68 1.08 165
>> 6 3.27800  1.00000  3356G  1518G 1838G 45.24 0.67 166
>>               TOTAL 23476G 15843G 7632G 67.49        
>> MIN/MAX VAR: 0.67/1.34  STDDEV: 14.53
>>
>> ceph osd tree
>> ID WEIGHT   TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 22.92604 root default                                          
>> -2  3.29749     host intcfs-osd1                                  
>> 0  3.29749         osd.0             up  1.00000          1.00000
>> -3  3.26869     host intcfs-osd2                                  
>> 1  3.26869         osd.1             up  1.00000          1.00000
>> -4  3.27339     host intcfs-osd3                                  
>> 2  3.27339         osd.2             up  1.00000          1.00000
>> -5  3.24089     host intcfs-osd4                                  
>> 3  3.24089         osd.3             up  1.00000          1.00000
>> -6  3.24089     host intcfs-osd5                                  
>> 4  3.24089         osd.4             up  1.00000          1.00000
>> -7  3.32669     host intcfs-osd6                                  
>> 5  3.32669         osd.5             up  1.00000          1.00000
>> -8  3.27800     host intcfs-osd7                                  
>> 6  3.27800         osd.6             up  1.00000          1.00000
>>
>> ceph osd pool ls detail
>>
>> pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
>> pool 3 'downloads_data' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 250 pgp_num 250 last_change 39 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 4 'downloads_metadata' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 250 pgp_num 250 last_change 36 flags hashpspool
>> stripe_width 0
>>
>> Regards
>> Prabu GJ
>>
>> ---- On Sun, 12 Nov 2017 19:20:34 +0530 Cassiano Pilipavicius
>> <cassiano@xxxxxxxxxxx> wrote ----
>>
>> I am also not an expert, but it looks like you have big data volumes on
>> few PGs, from what I've seen, the pg data is only deleted from the old OSD
>> when is completed copied to the new osd.
>>
>> So, if 1 pg have 100G por example, only when it is fully copied to the new
>> OSD, the space will be released on the old OSD.
>>
>> If you have a busy cluster/network, it may take a good while. Maybe just
>> wait a litle and check from time to time and the space will eventually be
>> released.
>>
>>
>> Em 11/12/2017 11:44 AM, Sébastien VIGNERON escreveu:
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> I’m not an expert either so if someone in the list have some ideas on this
>> problem, don’t be shy, share them with us.
>>
>> For now, I only have hypothese that the OSD space will be recovered as
>> soon as the recovery process is complete.
>> Hope everything will get back in order soon (before reaching 95% or
>> above).
>>
>> I saw some messages on the list about the fstrim tool which can help
>> reclaim unused free space, but i don’t know if it’s apply to your case.
>>
>> Cordialement / Best regards,
>>
>> Sébastien VIGNERON
>> CRIANN,
>> Ingénieur / Engineer
>> Technopôle du Madrillet
>> 745, avenue de l'Université
>> 76800 Saint-Etienne du Rouvray - France
>> tél. +33 2 32 91 42 91
>> fax. +33 2 32 91 42 92
>> http://www.criann.fr
>> mailto:sebastien.vigneron@xxxxxxxxx
>> support: support@xxxxxxxxx
>>
>> Le 12 nov. 2017 à 13:29, gjprabu <gjprabu@xxxxxxxxxxxx> a écrit :
>>
>> Hi Sebastien,
>>
>>     Below is the query details. I am not that much expert and still
>> learning . pg's are not stuck stat before adding osd and pg are slowly
>> clearing stat to active-clean. Today morning there was around 53
>> active+undersized+degraded+remapped+wait_backfill and now it is 21 only,
>> hope its going on and i am seeing the space keep increasing in newly added
>> OSD (osd.6)
>>
>>
>> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS
>> 0 3.29749  1.00000  3376G  2814G  562G 83.35 1.23 165  ( Available Spaces
>> not reduced after adding new OSD)
>> 1 3.26869  1.00000  3347G  1923G 1423G 57.48 0.85 152
>> 2 3.27339  1.00000  3351G  1980G 1371G 59.10 0.88 161
>> 3 3.24089  1.00000  3318G  2131G 1187G 64.23 0.95 168
>> 4 3.24089  1.00000  3318G  2998G  319G 90.36 1.34 176  ( Available Spaces
>> not reduced after adding new OSD)
>> 5 3.32669  1.00000  3406G  2476G  930G 72.68 1.08 165  ( Available Spaces
>> not reduced after adding new OSD)
>> 6 3.27800  1.00000  3356G  1518G 1838G 45.24 0.67 166
>>               TOTAL 23476G 15843G 7632G 67.49        
>> MIN/MAX VAR: 0.67/1.34  STDDEV: 14.53
>>
>> ...
>>
>>
>>
>> _______________________________________________ ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com