See http://tracker.ceph.com/issues/22351#note-11 On Wed, Jan 17, 2018 at 10:09 AM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: > On Wed, Jan 17, 2018 at 5:41 AM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >> On Wed, Jan 17, 2018 at 2:20 AM, Nikos Kormpakis <nkorb@xxxxxxxxxxxx> wrote: >>> On 01/16/2018 12:53 AM, Brad Hubbard wrote: >>>> On Tue, Jan 16, 2018 at 1:35 AM, Alexander Peters <apeters@xxxxxxxxx> wrote: >>>>> i created the dump output but it looks very cryptic to me so i can't really make much sense of it. is there anything to look for in particular? >>>> >>>> Yes, basically we are looking for any line that ends in "= 34". You >>>> might also find piping it through c++filt helps. >>>> >>>> Something like... >>>> >>>> $ c++filt </tmp/ltrace.out|grep "= 34" >>> >>> Hello, >>> we're facing the exact same issue. I added some more info about >>> our cluster and output from ltrace in [1]. >> >> Unfortunately, the strlen lines in that output are expected. >> >> Is it possible for me to access the ltrace output file somehow >> (you could email it directly or use ceph-post-file perhaps)? > > Ah, nm, my bad. > > It turns out what we need is the hexadecimal int representation of '-34'. > > $ c++filt </tmp/ltrace.out|grep "0xffffffde" > > I'll update the tracker accordingly. > >> >>> >>> Best regards, >>> Nikos. >>> >>> [1] http://tracker.ceph.com/issues/22351 >>> >>>>> >>>>> i think i am going to read up on how interpret ltrace output... >>>>> >>>>> BR >>>>> Alex >>>>> >>>>> ----- Ursprüngliche Mail ----- >>>>> Von: "Brad Hubbard" <bhubbard@xxxxxxxxxx> >>>>> An: "Alexander Peters" <alexander.peters@xxxxxxxxx> >>>>> CC: "Ceph Users" <ceph-users@xxxxxxxxxxxxxx> >>>>> Gesendet: Montag, 15. Januar 2018 03:09:53 >>>>> Betreff: Re: radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range" >>>>> >>>>> On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >>>>>> On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters >>>>>> <alexander.peters@xxxxxxxxx> wrote: >>>>>>> Thanks for the reply - unfortunatly the link you send is behind a paywall so >>>>>>> at least for now i can’t read it. >>>>>> >>>>>> That's why I provided the cause as laid out in that article (pgp num > pg num). >>>>>> >>>>>> Do you have any settings in ceph.conf related to pg_num or pgp_num? >>>>>> >>>>>> If not, please add your details to http://tracker.ceph.com/issues/22351 >>>>> >>>>> Rados can return ERANGE (34) in multiple places so identifying where >>>>> might be a big step towards working this out. >>>>> >>>>> $ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name >>>>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d >>>>> >>>>> The objective is to find which function(s) return 34. >>>>> >>>>>> >>>>>>> >>>>>>> output of ceph osd dump shows that pgp num == pg num: >>>>>>> >>>>>>> [root@ctrl01 ~]# ceph osd dump >>>>>>> epoch 142 >>>>>>> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10 >>>>>>> created 2017-12-20 23:04:59.781525 >>>>>>> modified 2018-01-14 21:30:57.528682 >>>>>>> flags sortbitwise,recovery_deletes,purged_snapdirs >>>>>>> crush_version 6 >>>>>>> full_ratio 0.95 >>>>>>> backfillfull_ratio 0.9 >>>>>>> nearfull_ratio 0.85 >>>>>>> require_min_compat_client jewel >>>>>>> min_compat_client jewel >>>>>>> require_osd_release luminous >>>>>>> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash >>>>>>> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width >>>>>>> 0 application rbd >>>>>>> removed_snaps [1~3] >>>>>>> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash >>>>>>> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width >>>>>>> 0 application rbd >>>>>>> removed_snaps [1~3] >>>>>>> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash >>>>>>> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width >>>>>>> 0 application rbd >>>>>>> removed_snaps [1~3] >>>>>>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash >>>>>>> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags >>>>>>> hashpspool stripe_width 0 application rgw >>>>>>> max_osd 3 >>>>>>> osd.0 up in weight 1 up_from 82 up_thru 140 down_at 79 >>>>>>> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795 >>>>>>> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up >>>>>>> abe33844-6d98-4ede-81a8-a8bdc92dada8 >>>>>>> osd.1 up in weight 1 up_from 73 up_thru 140 down_at 71 >>>>>>> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756 >>>>>>> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up >>>>>>> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a >>>>>>> osd.2 up in weight 1 up_from 140 up_thru 140 down_at 133 >>>>>>> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749 >>>>>>> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up >>>>>>> 220bba17-8119-4035-9e43-5b8eaa27562f >>>>>>> >>>>>>> >>>>>>> Am 15.01.2018 um 01:33 schrieb Brad Hubbard <bhubbard@xxxxxxxxxx>: >>>>>>> >>>>>>> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters >>>>>>> <alexander.peters@xxxxxxxxx> wrote: >>>>>>> >>>>>>> Hello >>>>>>> >>>>>>> I am currently experiencing a strange issue with my radosgw. It Fails to >>>>>>> start and all tit says is: >>>>>>> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name >>>>>>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d >>>>>>> 2018-01-14 21:30:57.132007 7f44ddd18e00 0 deferred set uid:gid to 167:167 >>>>>>> (ceph:ceph) >>>>>>> 2018-01-14 21:30:57.132161 7f44ddd18e00 0 ceph version 12.2.2 >>>>>>> (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process >>>>>>> (unknown), pid 13928 >>>>>>> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize >>>>>>> watch: (34) Numerical result out of range >>>>>>> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider >>>>>>> (RADOS) >>>>>>> >>>>>>> (when started via systemctl it writes the same lines to the logfile) >>>>>>> >>>>>>> strange thing is that it is working on an other env that was installed with >>>>>>> the same set of ansible playbooks. >>>>>>> OS is CentOS Linux release 7.4.1708 (Core) >>>>>>> >>>>>>> Ceph is up and running ( I am currently using it for storing volumes and >>>>>>> images form Openstack ) >>>>>>> >>>>>>> Does anyone have an idea how to debug this? >>>>>>> >>>>>>> >>>>>>> According to https://access.redhat.com/solutions/2778161 this can >>>>>>> happen if your pgp num is higher than the pg num. >>>>>>> >>>>>>> Check "ceph osd dump" output for that possibility. >>>>>>> >>>>>>> >>>>>>> Best Regards >>>>>>> Alexander >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Cheers, >>>>>>> Brad >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Cheers, >>>>>> Brad >>>>> >>>>> >>>>> >>>>> -- >>>>> Cheers, >>>>> Brad >>>> >>>> >>>> >>> >>> >>> -- >>> Nikos Kormpakis - nkorb@xxxxxxxxxxxx >>> Network Operations Center, Greek Research & Technology Network >>> Tel: +30 210 7475712 - http://www.grnet.gr >>> 7, Kifisias Av., 115 23 Athens, Greece >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> -- >> Cheers, >> Brad > > > > -- > Cheers, > Brad -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com