Re: Ceph counters decrementing after changing pg_num

Kai Storbeck <kai@xxxxxxxxxx> · Mon, 23 Jan 2017 12:43:25 +0100

Hello Wido and Shinobu,

On 20/01/2017 19:54, Shinobu Kinjo wrote:
> What does `ceph -s` say?

Health_ok; this was not the cause, thanks though.

> On Sat, Jan 21, 2017 at 3:39 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>>
>>> Op 20 januari 2017 om 17:17 schreef Kai Storbeck <kai@xxxxxxxxxx>:
>>>
>>> My graphs of several counters in our Ceph cluster are showing abnormal
>>> behaviour after changing the pg_num and pgp_num respectively.
>>
>> What counters exactly? Like pg information? It could be that it needs a scrub on all PGs before that information is corrected. This scrub will trigger automatically.

The global "pool" counters that had a value > 0 for that pool, not the
gauges though.

I saved 2 outputs of "ceph pg dump pools --format=json-pretty"; this was
the diff; the gauges were going upward properly, the counters (starting
from num_read) were seeing dips of 15k.

> mon1.ceph1:~ $ diff -u evidence.{1,2}
> --- evidence.1	2017-01-23 12:26:57.686325559 +0100
> +++ evidence.2	2017-01-23 12:27:03.294312056 +0100
> @@ -1,32 +1,32 @@
>  {
>    "poolid": 4,
>    "stat_sum": {
> -    "num_bytes": 602382731805,
> -    "num_objects": 145200,
> +    "num_bytes": 602524072732,
> +    "num_objects": 145235,
>      "num_object_clones": 0,
> -    "num_object_copies": 435600,
> +    "num_object_copies": 435705,
>      "num_objects_missing_on_primary": 0,
>      "num_objects_degraded": 0,
>      "num_objects_misplaced": 0,
>      "num_objects_unfound": 0,
> -    "num_objects_dirty": 145200,
> +    "num_objects_dirty": 145235,
>      "num_whiteouts": 0,
> -    "num_read": 2543730,
> -    "num_read_kb": 34612661,
> -    "num_write": 67930863,
> -    "num_write_kb": 5296056074,
> +    "num_read": 2472053,
> +    "num_read_kb": 33442430,
> +    "num_write": 64743968,
> +    "num_write_kb": 5138061258,
>      "num_scrub_errors": 0,
>      "num_shallow_scrub_errors": 0,
>      "num_deep_scrub_errors": 0,
> -    "num_objects_recovered": 42352,
> -    "num_bytes_recovered": 175259565950,
> -    "num_keys_recovered": 155,
> +    "num_objects_recovered": 40789,
> +    "num_bytes_recovered": 168786003824,
> +    "num_keys_recovered": 149,
>      "num_objects_omap": 61,
>      "num_objects_hit_set_archive": 0,
>      "num_bytes_hit_set_archive": 0
>    },
> -  "log_size": 4052481,
> -  "ondisk_log_size": 4052481,
> +  "log_size": 4052612,
> +  "ondisk_log_size": 4052612,
>    "up": 6144,
>    "acting": 6144
>  }

Your answer wrt the scrubs is probably right, as the issue has settled
now, but I couldn't find any references to it.

Right, but not making sense: These counters shouldn't have lowered on a
global scope, so why the change of pg_num influences this is weird.

Thanks for the answers, it is now going upwards again. I might suggest
that the documentation for changing the pg_num contains a small note
about this :-).

Regards,
Kai

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com