Re: radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

Brad Hubbard <bhubbard@xxxxxxxxxx> · Mon, 15 Jan 2018 12:09:53 +1000

On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
> On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters
> <alexander.peters@xxxxxxxxx> wrote:
>> Thanks for the reply - unfortunatly the link you send is behind a paywall so
>> at least for now i can’t read it.
>
> That's why I provided the cause as laid out in that article (pgp num > pg num).
>
> Do you have any settings in ceph.conf related to pg_num or pgp_num?
>
> If not, please add your details to http://tracker.ceph.com/issues/22351

Rados can return ERANGE (34) in multiple places so identifying where
might be a big step towards working this out.

$ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name
client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d

The objective is to find which function(s) return 34.

>
>>
>> output of ceph osd dump shows that pgp num == pg num:
>>
>> [root@ctrl01 ~]# ceph osd dump
>> epoch 142
>> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
>> created 2017-12-20 23:04:59.781525
>> modified 2018-01-14 21:30:57.528682
>> flags sortbitwise,recovery_deletes,purged_snapdirs
>> crush_version 6
>> full_ratio 0.95
>> backfillfull_ratio 0.9
>> nearfull_ratio 0.85
>> require_min_compat_client jewel
>> min_compat_client jewel
>> require_osd_release luminous
>> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags
>> hashpspool stripe_width 0 application rgw
>> max_osd 3
>> osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79
>> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795
>> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up
>> abe33844-6d98-4ede-81a8-a8bdc92dada8
>> osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71
>> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756
>> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up
>> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
>> osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133
>> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749
>> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up
>> 220bba17-8119-4035-9e43-5b8eaa27562f
>>
>>
>> Am 15.01.2018 um 01:33 schrieb Brad Hubbard <bhubbard@xxxxxxxxxx>:
>>
>> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
>> <alexander.peters@xxxxxxxxx> wrote:
>>
>> Hello
>>
>> I am currently experiencing a strange issue with my radosgw. It Fails to
>> start and all tit says is:
>> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name
>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
>> 2018-01-14 21:30:57.132007 7f44ddd18e00  0 deferred set uid:gid to 167:167
>> (ceph:ceph)
>> 2018-01-14 21:30:57.132161 7f44ddd18e00  0 ceph version 12.2.2
>> (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
>> (unknown), pid 13928
>> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize
>> watch: (34) Numerical result out of range
>> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider
>> (RADOS)
>>
>> (when started via systemctl it writes the same lines to the logfile)
>>
>> strange thing is that it is working on an other env that was installed with
>> the same set of ansible playbooks.
>> OS is CentOS Linux release 7.4.1708 (Core)
>>
>> Ceph is up and running ( I am currently using it for storing volumes and
>> images form Openstack )
>>
>> Does anyone have an idea how to debug this?
>>
>>
>> According to https://access.redhat.com/solutions/2778161 this can
>> happen if your pgp num is higher than the pg num.
>>
>> Check "ceph osd dump" output for that possibility.
>>
>>
>> Best Regards
>> Alexander
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>> --
>> Cheers,
>> Brad
>>
>>
>
>
>
> --
> Cheers,
> Brad

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com