Hello Wido and Shinobu, On 20/01/2017 19:54, Shinobu Kinjo wrote: > What does `ceph -s` say? Health_ok; this was not the cause, thanks though. > On Sat, Jan 21, 2017 at 3:39 AM, Wido den Hollander <wido@xxxxxxxx> wrote: >> >>> Op 20 januari 2017 om 17:17 schreef Kai Storbeck <kai@xxxxxxxxxx>: >>> >>> My graphs of several counters in our Ceph cluster are showing abnormal >>> behaviour after changing the pg_num and pgp_num respectively. >> >> What counters exactly? Like pg information? It could be that it needs a scrub on all PGs before that information is corrected. This scrub will trigger automatically. The global "pool" counters that had a value > 0 for that pool, not the gauges though. I saved 2 outputs of "ceph pg dump pools --format=json-pretty"; this was the diff; the gauges were going upward properly, the counters (starting from num_read) were seeing dips of 15k. > mon1.ceph1:~ $ diff -u evidence.{1,2} > --- evidence.1 2017-01-23 12:26:57.686325559 +0100 > +++ evidence.2 2017-01-23 12:27:03.294312056 +0100 > @@ -1,32 +1,32 @@ > { > "poolid": 4, > "stat_sum": { > - "num_bytes": 602382731805, > - "num_objects": 145200, > + "num_bytes": 602524072732, > + "num_objects": 145235, > "num_object_clones": 0, > - "num_object_copies": 435600, > + "num_object_copies": 435705, > "num_objects_missing_on_primary": 0, > "num_objects_degraded": 0, > "num_objects_misplaced": 0, > "num_objects_unfound": 0, > - "num_objects_dirty": 145200, > + "num_objects_dirty": 145235, > "num_whiteouts": 0, > - "num_read": 2543730, > - "num_read_kb": 34612661, > - "num_write": 67930863, > - "num_write_kb": 5296056074, > + "num_read": 2472053, > + "num_read_kb": 33442430, > + "num_write": 64743968, > + "num_write_kb": 5138061258, > "num_scrub_errors": 0, > "num_shallow_scrub_errors": 0, > "num_deep_scrub_errors": 0, > - "num_objects_recovered": 42352, > - "num_bytes_recovered": 175259565950, > - "num_keys_recovered": 155, > + "num_objects_recovered": 40789, > + "num_bytes_recovered": 168786003824, > + "num_keys_recovered": 149, > "num_objects_omap": 61, > "num_objects_hit_set_archive": 0, > "num_bytes_hit_set_archive": 0 > }, > - "log_size": 4052481, > - "ondisk_log_size": 4052481, > + "log_size": 4052612, > + "ondisk_log_size": 4052612, > "up": 6144, > "acting": 6144 > } Your answer wrt the scrubs is probably right, as the issue has settled now, but I couldn't find any references to it. Right, but not making sense: These counters shouldn't have lowered on a global scope, so why the change of pg_num influences this is weird. Thanks for the answers, it is now going upwards again. I might suggest that the documentation for changing the pg_num contains a small note about this :-). Regards, Kai
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com