on a cursory look of the information it seems the cluster is overloaded with the requests. Just a guess, but if you look at IO usage on those spindles they'll be at or around 100% usage most of the time. If that is the case then increasing the pg_num and pgp_num won't help, and short term, will make it worse. Metadata pools (like default.rgw.buckets.index) really excel in a SSD pool, even if small. I carved a small OSD in the journal SSDs for those kinds of workloads. On Wed, Jan 31, 2018 at 2:26 PM, Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> wrote: > Is it safe to increase pg_num and pgp_num from 1024 up to 2048 for volumes > and default.rgw.buckets.data pools? > How will it impact cluster behavior? I guess cluster rebalancing will occur > and will take long time considering amount of data we have on it ? > > Regards > Jakub > > > > On Wed, Jan 31, 2018 at 1:37 PM, Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> > wrote: >> >> Hi, >> >> I'm wondering why slow requests are being reported mainly when the request >> has been put into the queue for processing by its PG (queued_for_pg , >> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#debugging-slow-request). >> Could it be due too low pg_num/pgp_num ? >> >> It looks that slow requests are mainly addressed to >> default.rgw.buckets.data (pool id 20) , volumes (pool id 3) and >> default.rgw.buckets.index (pool id 14) >> >> 2018-01-31 12:06:55.899557 osd.59 osd.59 10.212.32.22:6806/4413 38 : >> cluster [WRN] slow request 30.125793 seconds old, received at 2018-01-31 >> 12:06:25.773675: osd_op(client.857003.0:126171692 3.a4fec1ad 3.a4fec1ad >> (undecoded) ack+ondisk+write+known_if_redirected e5722) currently >> queued_for_pg >> >> Btw how can I get more human-friendly client information from log entry >> like above ? >> >> Current pg_num/pgp_num >> >> pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool >> stripe_width 0 application rbd >> removed_snaps [1~3] >> pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2 >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags >> hashpspool stripe_width 0 application rgw >> pool 20 'default.rgw.buckets.data' erasure size 9 min_size 6 crush_rule 1 >> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags >> hashpspool stripe_width 4224 application rgw >> >> Usage >> >> GLOBAL: >> SIZE AVAIL RAW USED %RAW USED OBJECTS >> 385T 144T 241T 62.54 31023k >> POOLS: >> NAME ID QUOTA OBJECTS QUOTA BYTES >> USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE >> RAW USED >> volumes 3 N/A N/A >> 40351G 70.91 16557G 10352314 10109k 2130M 2520M >> 118T >> default.rgw.buckets.index 14 N/A N/A >> 0 0 16557G 205 205 160M 27945k >> 0 >> default.rgw.buckets.data 20 N/A N/A >> 79190G 70.51 33115G 20865953 20376k 122M 113M >> 116T >> >> >> >> # ceph osd pool ls detail >> pool 0 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 64 pgp_num 64 last_change 4502 flags hashpspool stripe_width >> 0 application rbd >> pool 1 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool >> stripe_width 0 application rbd >> pool 2 'images' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 512 pgp_num 512 last_change 5175 flags hashpspool >> stripe_width 0 application rbd >> removed_snaps [1~7,14~2] >> pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool >> stripe_width 0 application rbd >> removed_snaps [1~3] >> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool stripe_width 0 >> application rgw >> pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool >> stripe_width 0 application rgw >> pool 6 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool >> stripe_width 0 application rgw >> pool 7 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool >> stripe_width 0 application rgw >> pool 8 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool >> stripe_width 0 application rgw >> pool 9 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool >> stripe_width 0 application rgw >> pool 10 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool >> stripe_width 0 application rgw >> pool 11 'default.rgw.users.email' replicated size 3 min_size 2 crush_rule >> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 owner >> 18446744073709551615 flags hashpspool stripe_width 0 application rgw >> pool 12 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 owner >> 18446744073709551615 flags hashpspool stripe_width 0 application rgw >> pool 13 'default.rgw.users.swift' replicated size 3 min_size 2 crush_rule >> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool >> stripe_width 0 application rgw >> pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2 >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags >> hashpspool stripe_width 0 application rgw >> pool 15 'default.rgw.buckets.data.old' replicated size 3 min_size 2 >> crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 4502 >> flags hashpspool stripe_width 0 application rgw >> pool 16 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags >> hashpspool stripe_width 0 application rgw >> pool 17 'default.rgw.buckets.extra' replicated size 3 min_size 2 >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags >> hashpspool stripe_width 0 application rgw >> pool 18 '.rgw.buckets.extra' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool >> stripe_width 0 application rgw >> pool 20 'default.rgw.buckets.data' erasure size 9 min_size 6 crush_rule 1 >> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags >> hashpspool stripe_width 4224 application rgw >> pool 21 'benchmark_replicated' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4550 flags >> hashpspool stripe_width 0 application rbd >> removed_snaps [1~3] >> pool 22 'benchmark_erasure_coded' erasure size 9 min_size 7 crush_rule 1 >> object_hash rjenkins pg_num 32 pgp_num 32 last_change 4552 flags hashpspool >> stripe_width 24576 application rbd >> removed_snaps [1~3] >> >> >> # ceph df detail >> GLOBAL: >> SIZE AVAIL RAW USED %RAW USED OBJECTS >> 385T 144T 241T 62.54 31023k >> POOLS: >> NAME ID QUOTA OBJECTS QUOTA BYTES >> USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE >> RAW USED >> rbd 0 N/A N/A >> 0 0 16557G 0 0 1 134k >> 0 >> vms 1 N/A N/A >> 0 0 16557G 0 0 0 0 >> 0 >> images 2 N/A N/A >> 7659M 0.05 16557G 1022 1022 51247 5668 >> 22977M >> volumes 3 N/A N/A >> 40351G 70.91 16557G 10352314 10109k 2130M 2520M >> 118T >> .rgw.root 4 N/A N/A >> 1588 0 16557G 4 4 90 4 >> 4764 >> default.rgw.control 5 N/A N/A >> 0 0 16557G 8 8 0 0 >> 0 >> default.rgw.data.root 6 N/A N/A >> 93943 0 16557G 336 336 239k 6393 >> 275k >> default.rgw.gc 7 N/A N/A >> 0 0 16557G 32 32 1773M 5281k >> 0 >> default.rgw.log 8 N/A N/A >> 0 0 16557G 185 185 22404k 14936k >> 0 >> default.rgw.users.uid 9 N/A N/A >> 3815 0 16557G 15 15 187k 53303 >> 11445 >> default.rgw.usage 10 N/A N/A >> 0 0 16557G 7 7 278k 556k >> 0 >> default.rgw.users.email 11 N/A N/A >> 58 0 16557G 3 3 0 3 >> 174 >> default.rgw.users.keys 12 N/A N/A >> 177 0 16557G 10 10 262 22 >> 531 >> default.rgw.users.swift 13 N/A N/A >> 40 0 16557G 3 3 0 3 >> 120 >> default.rgw.buckets.index 14 N/A N/A >> 0 0 16557G 205 205 160M 27945k >> 0 >> default.rgw.buckets.data.old 15 N/A N/A >> 668G 3.88 16557G 180867 176k 707k 2318k >> 2004G >> default.rgw.buckets.non-ec 16 N/A N/A >> 0 0 16557G 114 114 17960 12024 >> 0 >> default.rgw.buckets.extra 17 N/A N/A >> 0 0 16557G 0 0 0 0 >> 0 >> .rgw.buckets.extra 18 N/A N/A >> 0 0 16557G 0 0 0 0 >> 0 >> default.rgw.buckets.data 20 N/A N/A >> 79190G 70.51 33115G 20865953 20376k 122M 113M >> 116T >> benchmark_replicated 21 N/A N/A >> 1415G 7.88 16557G 363800 355k 1338k 1251k >> 4247G >> benchmark_erasure_coded 22 N/A N/A >> 11057M 0.03 33115G 2761 2761 398 5520 >> 16586M >> >> >> Thanks >> Jakub > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com