Re: High apply latency

Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> · Wed, 31 Jan 2018 22:10:26 +0100

Hi Luis,

Thanks for your comment, I see high %util for few HDDs per each ceph node but actually there is very low traffic from client.

iostat -xd shows ongoing operations 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     1,60    0,00    3,00     0,00    18,40    12,27     0,00    0,00    0,00    0,00   0,00   0,00
sdb               0,00     0,20    0,60    8,00    14,40   488,10   116,86     0,00    0,56    8,00    0,00   0,56   0,48
sdc               0,00     0,00  153,80    1,80 25304,00    31,30   325,65     1,12    7,20    7,28    0,00   4,21  65,52
sdd               0,00     5,40  406,80   44,00 102275,20  3295,60   468,37     1,85    4,12    4,29    2,53   2,11  95,12
sde               0,00     0,60    3,20   12,00    51,20  2461,50   330,62     0,07    4,32   12,25    2,20   2,63   4,00
sdf               0,00     0,40    1,40    8,20    44,00  1424,90   306,02     0,01    1,50   10,29    0,00   1,50   1,44
sdg               0,00     0,60   92,80   19,00  5483,20  2998,90   151,74     0,98    8,74   10,36    0,84   7,40  82,72
sdh               0,00     0,00  154,40    1,40 25299,20    74,20   325,72     1,07    6,88    6,94    0,00   4,07  63,44
sdi               0,00     0,00    0,20    7,80    12,80   397,50   102,58     0,00    0,30   12,00    0,00   0,30   0,24
sdj               0,00     0,20    0,00    4,00     0,00   645,60   322,80     0,00    0,00    0,00    0,00   0,00   0,00
sdk               0,00     0,20    1,40   15,60    32,00  1956,50   233,94     0,02    1,08   13,14    0,00   1,08   1,84
sdl               0,00     0,40    0,60    4,00    16,80   447,00   201,65     0,02    4,35   20,00    2,00   2,78   1,28
sdm               0,00     0,00   10,00    4,40   232,00   521,80   104,69     0,08    5,89    8,48    0,00   4,89   7,04
dm-0              0,00     0,00    0,00    4,60     0,00    18,40     8,00     0,00    0,00    0,00    0,00   0,00   0,00
nvme0n1           0,00     0,00    0,00  124,80     0,00 10366,40   166,13     0,01    0,12    0,00    0,12   0,03   0,32

when ceph -s shows low client traffic

# ceph -s
  cluster:
    id:     1023c49f-3b10-42de-9f62-9b122db32a9a
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum host01,host02,host03
    mgr: host02(active), standbys: host01, host03
    osd: 108 osds: 106 up, 106 in
    rgw: 3 daemons active

  data:
    pools:   22 pools, 4880 pgs
    objects: 31121k objects, 119 TB
    usage:   241 TB used, 143 TB / 385 TB avail
    pgs:     4875 active+clean
             3    active+clean+scrubbing+deep
             2    active+clean+scrubbing

  io:
    client:   17646 B/s rd, 19038 kB/s wr, 4 op/s rd, 175 op/s wr

Is there any background tasks running and utilizing disks? Is it scrubbing generating this load?
             3    active+clean+scrubbing+deep
             2    active+clean+scrubbing

 Thanks
Jakub

On Wed, Jan 31, 2018 at 3:59 PM, Luis Periquito <periquito@xxxxxxxxx> wrote:
on a cursory look of the information it seems the cluster is

overloaded with the requests.

Just a guess, but if you look at IO usage on those spindles they'll be

at or around 100% usage most of the time.

If that is the case then increasing the pg_num and pgp_num won't help,

and short term, will make it worse.

Metadata pools (like default.rgw.buckets.index) really excel in a SSD

pool, even if small. I carved a small OSD in the journal SSDs for

those kinds of workloads.

On Wed, Jan 31, 2018 at 2:26 PM, Jakub Jaszewski

<jaszewski.jakub@xxxxxxxxx> wrote:

> Is it safe to increase pg_num and pgp_num from 1024 up to 2048 for volumes

> and default.rgw.buckets.data pools?

> How will it impact cluster behavior? I guess cluster rebalancing will occur

> and will take long time considering amount of data we have on it ?

>

> Regards

> Jakub

>

>

>

> On Wed, Jan 31, 2018 at 1:37 PM, Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx>

> wrote:

>>

>> Hi,

>>

>> I'm wondering why slow requests are being reported mainly when the request

>> has been put into the queue for processing by its PG  (queued_for_pg ,

>> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#debugging-slow-request).

>> Could it be due too low pg_num/pgp_num ?

>>

>> It looks that slow requests are mainly addressed to

>> default.rgw.buckets.data (pool id 20) , volumes (pool id 3) and

>> default.rgw.buckets.index (pool id 14)

>>

>> 2018-01-31 12:06:55.899557 osd.59 osd.59 10.212.32.22:6806/4413 38 :

>> cluster [WRN] slow request 30.125793 seconds old, received at 2018-01-31

>> 12:06:25.773675: osd_op(client.857003.0:126171692 3.a4fec1ad 3.a4fec1ad

>> (undecoded) ack+ondisk+write+known_if_redirected e5722) currently

>> queued_for_pg

>>

>> Btw how can I get more human-friendly client information from log entry

>> like above ?

>>

>> Current pg_num/pgp_num

>>

>> pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash

>> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool

>> stripe_width 0 application rbd

>> removed_snaps [1~3]

>> pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2

>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags

>> hashpspool stripe_width 0 application rgw

>> pool 20 'default.rgw.buckets.data' erasure size 9 min_size 6 crush_rule 1

>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags

>> hashpspool stripe_width 4224 application rgw

>>

>> Usage

>>

>> GLOBAL:

>>     SIZE     AVAIL     RAW USED     %RAW USED     OBJECTS

>>     385T      144T         241T         62.54      31023k

>> POOLS:

>>     NAME                             ID     QUOTA OBJECTS     QUOTA BYTES

>> USED       %USED     MAX AVAIL     OBJECTS      DIRTY      READ       WRITE

>> RAW USED

>>     volumes                          3      N/A               N/A

>> 40351G     70.91        16557G     10352314     10109k      2130M      2520M

>> 118T

>>     default.rgw.buckets.index        14     N/A               N/A

>> 0         0        16557G          205        205       160M     27945k

>> 0

>>     default.rgw.buckets.data         20     N/A               N/A

>> 79190G     70.51        33115G     20865953     20376k       122M       113M

>> 116T

>>

>>

>>

>> # ceph osd pool ls detail

>> pool 0 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash

>> rjenkins pg_num 64 pgp_num 64 last_change 4502 flags hashpspool stripe_width

>> 0 application rbd

>> pool 1 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash

>> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool

>> stripe_width 0 application rbd

>> pool 2 'images' replicated size 3 min_size 2 crush_rule 0 object_hash

>> rjenkins pg_num 512 pgp_num 512 last_change 5175 flags hashpspool

>> stripe_width 0 application rbd

>> removed_snaps [1~7,14~2]

>> pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash

>> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool

>> stripe_width 0 application rbd

>> removed_snaps [1~3]

>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash

>> rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool stripe_width 0

>> application rgw

>> pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0

>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool

>> stripe_width 0 application rgw

>> pool 6 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0

>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool

>> stripe_width 0 application rgw

>> pool 7 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0

>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool

>> stripe_width 0 application rgw

>> pool 8 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0

>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool

>> stripe_width 0 application rgw

>> pool 9 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0

>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool

>> stripe_width 0 application rgw

>> pool 10 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0

>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool

>> stripe_width 0 application rgw

>> pool 11 'default.rgw.users.email' replicated size 3 min_size 2 crush_rule

>> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 owner

>> 18446744073709551615 flags hashpspool stripe_width 0 application rgw

>> pool 12 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0

>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 owner

>> 18446744073709551615 flags hashpspool stripe_width 0 application rgw

>> pool 13 'default.rgw.users.swift' replicated size 3 min_size 2 crush_rule

>> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool

>> stripe_width 0 application rgw

>> pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2

>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags

>> hashpspool stripe_width 0 application rgw

>> pool 15 'default.rgw.buckets.data.old' replicated size 3 min_size 2

>> crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 4502

>> flags hashpspool stripe_width 0 application rgw

>> pool 16 'default.rgw.buckets.non-ec' replicated size 3 min_size 2

>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags

>> hashpspool stripe_width 0 application rgw

>> pool 17 'default.rgw.buckets.extra' replicated size 3 min_size 2

>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags

>> hashpspool stripe_width 0 application rgw

>> pool 18 '.rgw.buckets.extra' replicated size 3 min_size 2 crush_rule 0

>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool

>> stripe_width 0 application rgw

>> pool 20 'default.rgw.buckets.data' erasure size 9 min_size 6 crush_rule 1

>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags

>> hashpspool stripe_width 4224 application rgw

>> pool 21 'benchmark_replicated' replicated size 3 min_size 2 crush_rule 0

>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4550 flags

>> hashpspool stripe_width 0 application rbd

>> removed_snaps [1~3]

>> pool 22 'benchmark_erasure_coded' erasure size 9 min_size 7 crush_rule 1

>> object_hash rjenkins pg_num 32 pgp_num 32 last_change 4552 flags hashpspool

>> stripe_width 24576 application rbd

>> removed_snaps [1~3]

>>

>>

>> # ceph df detail

>> GLOBAL:

>>     SIZE     AVAIL     RAW USED     %RAW USED     OBJECTS

>>     385T      144T         241T         62.54      31023k

>> POOLS:

>>     NAME                             ID     QUOTA OBJECTS     QUOTA BYTES

>> USED       %USED     MAX AVAIL     OBJECTS      DIRTY      READ       WRITE

>> RAW USED

>>     rbd                              0      N/A               N/A

>> 0         0        16557G            0          0          1       134k

>> 0

>>     vms                              1      N/A               N/A

>> 0         0        16557G            0          0          0          0

>> 0

>>     images                           2      N/A               N/A

>> 7659M      0.05        16557G         1022       1022      51247       5668

>> 22977M

>>     volumes                          3      N/A               N/A

>> 40351G     70.91        16557G     10352314     10109k      2130M      2520M

>> 118T

>>     .rgw.root                        4      N/A               N/A

>> 1588         0        16557G            4          4         90          4

>> 4764

>>     default.rgw.control              5      N/A               N/A

>> 0         0        16557G            8          8          0          0

>> 0

>>     default.rgw.data.root            6      N/A               N/A

>> 93943         0        16557G          336        336       239k       6393

>> 275k

>>     default.rgw.gc                   7      N/A               N/A

>> 0         0        16557G           32         32      1773M      5281k

>> 0

>>     default.rgw.log                  8      N/A               N/A

>> 0         0        16557G          185        185     22404k     14936k

>> 0

>>     default.rgw.users.uid            9      N/A               N/A

>> 3815         0        16557G           15         15       187k      53303

>> 11445

>>     default.rgw.usage                10     N/A               N/A

>> 0         0        16557G            7          7       278k       556k

>> 0

>>     default.rgw.users.email          11     N/A               N/A

>> 58         0        16557G            3          3          0          3

>> 174

>>     default.rgw.users.keys           12     N/A               N/A

>> 177         0        16557G           10         10        262         22

>> 531

>>     default.rgw.users.swift          13     N/A               N/A

>> 40         0        16557G            3          3          0          3

>> 120

>>     default.rgw.buckets.index        14     N/A               N/A

>> 0         0        16557G          205        205       160M     27945k

>> 0

>>     default.rgw.buckets.data.old     15     N/A               N/A

>> 668G      3.88        16557G       180867       176k       707k      2318k

>> 2004G

>>     default.rgw.buckets.non-ec       16     N/A               N/A

>> 0         0        16557G          114        114      17960      12024

>> 0

>>     default.rgw.buckets.extra        17     N/A               N/A

>> 0         0        16557G            0          0          0          0

>> 0

>>     .rgw.buckets.extra               18     N/A               N/A

>> 0         0        16557G            0          0          0          0

>> 0

>>     default.rgw.buckets.data         20     N/A               N/A

>> 79190G     70.51        33115G     20865953     20376k       122M       113M

>> 116T

>>     benchmark_replicated             21     N/A               N/A

>> 1415G      7.88        16557G       363800       355k      1338k      1251k

>> 4247G

>>     benchmark_erasure_coded          22     N/A               N/A

>> 11057M      0.03        33115G         2761       2761        398       5520

>> 16586M

>>

>>

>> Thanks

>> Jakub

>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com