Re: High apply latency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



on a cursory look of the information it seems the cluster is
overloaded with the requests.

Just a guess, but if you look at IO usage on those spindles they'll be
at or around 100% usage most of the time.

If that is the case then increasing the pg_num and pgp_num won't help,
and short term, will make it worse.

Metadata pools (like default.rgw.buckets.index) really excel in a SSD
pool, even if small. I carved a small OSD in the journal SSDs for
those kinds of workloads.

On Wed, Jan 31, 2018 at 2:26 PM, Jakub Jaszewski
<jaszewski.jakub@xxxxxxxxx> wrote:
> Is it safe to increase pg_num and pgp_num from 1024 up to 2048 for volumes
> and default.rgw.buckets.data pools?
> How will it impact cluster behavior? I guess cluster rebalancing will occur
> and will take long time considering amount of data we have on it ?
>
> Regards
> Jakub
>
>
>
> On Wed, Jan 31, 2018 at 1:37 PM, Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx>
> wrote:
>>
>> Hi,
>>
>> I'm wondering why slow requests are being reported mainly when the request
>> has been put into the queue for processing by its PG  (queued_for_pg ,
>> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#debugging-slow-request).
>> Could it be due too low pg_num/pgp_num ?
>>
>> It looks that slow requests are mainly addressed to
>> default.rgw.buckets.data (pool id 20) , volumes (pool id 3) and
>> default.rgw.buckets.index (pool id 14)
>>
>> 2018-01-31 12:06:55.899557 osd.59 osd.59 10.212.32.22:6806/4413 38 :
>> cluster [WRN] slow request 30.125793 seconds old, received at 2018-01-31
>> 12:06:25.773675: osd_op(client.857003.0:126171692 3.a4fec1ad 3.a4fec1ad
>> (undecoded) ack+ondisk+write+known_if_redirected e5722) currently
>> queued_for_pg
>>
>> Btw how can I get more human-friendly client information from log entry
>> like above ?
>>
>> Current pg_num/pgp_num
>>
>> pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool
>> stripe_width 0 application rbd
>> removed_snaps [1~3]
>> pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2
>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags
>> hashpspool stripe_width 0 application rgw
>> pool 20 'default.rgw.buckets.data' erasure size 9 min_size 6 crush_rule 1
>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags
>> hashpspool stripe_width 4224 application rgw
>>
>> Usage
>>
>> GLOBAL:
>>     SIZE     AVAIL     RAW USED     %RAW USED     OBJECTS
>>     385T      144T         241T         62.54      31023k
>> POOLS:
>>     NAME                             ID     QUOTA OBJECTS     QUOTA BYTES
>> USED       %USED     MAX AVAIL     OBJECTS      DIRTY      READ       WRITE
>> RAW USED
>>     volumes                          3      N/A               N/A
>> 40351G     70.91        16557G     10352314     10109k      2130M      2520M
>> 118T
>>     default.rgw.buckets.index        14     N/A               N/A
>> 0         0        16557G          205        205       160M     27945k
>> 0
>>     default.rgw.buckets.data         20     N/A               N/A
>> 79190G     70.51        33115G     20865953     20376k       122M       113M
>> 116T
>>
>>
>>
>> # ceph osd pool ls detail
>> pool 0 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 4502 flags hashpspool stripe_width
>> 0 application rbd
>> pool 1 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool
>> stripe_width 0 application rbd
>> pool 2 'images' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 512 pgp_num 512 last_change 5175 flags hashpspool
>> stripe_width 0 application rbd
>> removed_snaps [1~7,14~2]
>> pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool
>> stripe_width 0 application rbd
>> removed_snaps [1~3]
>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool stripe_width 0
>> application rgw
>> pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
>> stripe_width 0 application rgw
>> pool 6 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
>> stripe_width 0 application rgw
>> pool 7 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
>> stripe_width 0 application rgw
>> pool 8 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
>> stripe_width 0 application rgw
>> pool 9 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
>> stripe_width 0 application rgw
>> pool 10 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
>> stripe_width 0 application rgw
>> pool 11 'default.rgw.users.email' replicated size 3 min_size 2 crush_rule
>> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 owner
>> 18446744073709551615 flags hashpspool stripe_width 0 application rgw
>> pool 12 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 owner
>> 18446744073709551615 flags hashpspool stripe_width 0 application rgw
>> pool 13 'default.rgw.users.swift' replicated size 3 min_size 2 crush_rule
>> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
>> stripe_width 0 application rgw
>> pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2
>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags
>> hashpspool stripe_width 0 application rgw
>> pool 15 'default.rgw.buckets.data.old' replicated size 3 min_size 2
>> crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 4502
>> flags hashpspool stripe_width 0 application rgw
>> pool 16 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags
>> hashpspool stripe_width 0 application rgw
>> pool 17 'default.rgw.buckets.extra' replicated size 3 min_size 2
>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags
>> hashpspool stripe_width 0 application rgw
>> pool 18 '.rgw.buckets.extra' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
>> stripe_width 0 application rgw
>> pool 20 'default.rgw.buckets.data' erasure size 9 min_size 6 crush_rule 1
>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags
>> hashpspool stripe_width 4224 application rgw
>> pool 21 'benchmark_replicated' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4550 flags
>> hashpspool stripe_width 0 application rbd
>> removed_snaps [1~3]
>> pool 22 'benchmark_erasure_coded' erasure size 9 min_size 7 crush_rule 1
>> object_hash rjenkins pg_num 32 pgp_num 32 last_change 4552 flags hashpspool
>> stripe_width 24576 application rbd
>> removed_snaps [1~3]
>>
>>
>> # ceph df detail
>> GLOBAL:
>>     SIZE     AVAIL     RAW USED     %RAW USED     OBJECTS
>>     385T      144T         241T         62.54      31023k
>> POOLS:
>>     NAME                             ID     QUOTA OBJECTS     QUOTA BYTES
>> USED       %USED     MAX AVAIL     OBJECTS      DIRTY      READ       WRITE
>> RAW USED
>>     rbd                              0      N/A               N/A
>> 0         0        16557G            0          0          1       134k
>> 0
>>     vms                              1      N/A               N/A
>> 0         0        16557G            0          0          0          0
>> 0
>>     images                           2      N/A               N/A
>> 7659M      0.05        16557G         1022       1022      51247       5668
>> 22977M
>>     volumes                          3      N/A               N/A
>> 40351G     70.91        16557G     10352314     10109k      2130M      2520M
>> 118T
>>     .rgw.root                        4      N/A               N/A
>> 1588         0        16557G            4          4         90          4
>> 4764
>>     default.rgw.control              5      N/A               N/A
>> 0         0        16557G            8          8          0          0
>> 0
>>     default.rgw.data.root            6      N/A               N/A
>> 93943         0        16557G          336        336       239k       6393
>> 275k
>>     default.rgw.gc                   7      N/A               N/A
>> 0         0        16557G           32         32      1773M      5281k
>> 0
>>     default.rgw.log                  8      N/A               N/A
>> 0         0        16557G          185        185     22404k     14936k
>> 0
>>     default.rgw.users.uid            9      N/A               N/A
>> 3815         0        16557G           15         15       187k      53303
>> 11445
>>     default.rgw.usage                10     N/A               N/A
>> 0         0        16557G            7          7       278k       556k
>> 0
>>     default.rgw.users.email          11     N/A               N/A
>> 58         0        16557G            3          3          0          3
>> 174
>>     default.rgw.users.keys           12     N/A               N/A
>> 177         0        16557G           10         10        262         22
>> 531
>>     default.rgw.users.swift          13     N/A               N/A
>> 40         0        16557G            3          3          0          3
>> 120
>>     default.rgw.buckets.index        14     N/A               N/A
>> 0         0        16557G          205        205       160M     27945k
>> 0
>>     default.rgw.buckets.data.old     15     N/A               N/A
>> 668G      3.88        16557G       180867       176k       707k      2318k
>> 2004G
>>     default.rgw.buckets.non-ec       16     N/A               N/A
>> 0         0        16557G          114        114      17960      12024
>> 0
>>     default.rgw.buckets.extra        17     N/A               N/A
>> 0         0        16557G            0          0          0          0
>> 0
>>     .rgw.buckets.extra               18     N/A               N/A
>> 0         0        16557G            0          0          0          0
>> 0
>>     default.rgw.buckets.data         20     N/A               N/A
>> 79190G     70.51        33115G     20865953     20376k       122M       113M
>> 116T
>>     benchmark_replicated             21     N/A               N/A
>> 1415G      7.88        16557G       363800       355k      1338k      1251k
>> 4247G
>>     benchmark_erasure_coded          22     N/A               N/A
>> 11057M      0.03        33115G         2761       2761        398       5520
>> 16586M
>>
>>
>> Thanks
>> Jakub
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux