> Op 6 november 2017 om 20:17 schreef Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx>: > > > On Mon, Nov 6, 2017 at 7:29 AM, Wido den Hollander <wido@xxxxxxxx> wrote: > > Hi, > > > > On a Ceph Luminous (12.2.1) environment I'm seeing RGWs stall and about the same time I see these errors in the RGW logs: > > > > 2017-11-06 15:50:24.859919 7f8f5fa1a700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.XXXXX:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.20 > > 2017-11-06 15:50:41.768881 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:XXXXX > > 2017-11-06 15:55:15.781739 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.meta:.meta:bucket.instance:XXXXX:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32:_XK5LExyXXXXX6EEIXxCD5Cws:1 > > 2017-11-06 15:55:25.784404 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.XXXXX:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32 > > > > I see one message from a year ago: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010531.html > > > > The setup has two RGWs running: > > > > - ceph-rgw1 > > - ceph-rgw2 > > > > While trying to figure this out I see that a "radosgw-admin period pull" hangs for ever. > > > > I don't know if that is related, but it's something I've noticed. > > > > Mainly I see that at random times the RGW stalls for about 30 seconds and while that happens these messages show up in the RGW's log. > > > > do you happen to know if there's a dynamic resharding happening? The > dynamic resharding should only affect the writes to the specific > bucket, and should not affect cache distribution though. Originally I > thought it could be HUP signal related issue, but that seem to be > fixed in 12.2.1. > No, it doesn't seem to be that way: $ radosgw-admin reshard list That's empty. Looking at the logs I see this happening: 2017-11-07 09:45:12.147335 7f985b34f700 10 cache put: name=gn1-pf.rgw.data.root++.bucket.meta.XXX-mon-bucket:eb32b1ca-807a-4867-aea5-ff43ef7647c6.14977556.9 info.flags=0x17 2017-11-07 09:45:12.147357 7f985b34f700 10 adding gn1-pf.rgw.data.root++.bucket.meta.XXX-mon-bucket:eb32b1ca-807a-4867-aea5-ff43ef7647c6.14977556.9 to cache LRU end 2017-11-07 09:45:12.147364 7f985b34f700 10 updating xattr: name=user.rgw.acl bl.length()=155 2017-11-07 09:45:12.147376 7f985b34f700 10 distributing notification oid=notify.6 bl.length()=708 2017-11-07 09:45:22.148361 7f985b34f700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.XXX-mon-bucket:eb32b1ca-807a-4867-aea5-ff43ef7647c6.14977556.9 2017-11-07 09:45:22.150273 7f985b34f700 10 cache put: name=gn1-pf.rgw.meta++.meta:bucket:XXX-mon-bucket:_iaUdq4vufCpgnMlapZCm169:1 info.flags=0x17 2017-11-07 09:45:22.150283 7f985b34f700 10 adding gn1-pf.rgw.meta++.meta:bucket:XXX-mon-bucket:_iaUdq4vufCpgnMlapZCm169:1 to cache LRU end 2017-11-07 09:45:22.150291 7f985b34f700 10 distributing notification oid=notify.1 bl.length()=407 2017-11-07 09:45:31.881703 7f985b34f700 10 cache put: name=gn1-pf.rgw.data.root++XXX-mon-bucket info.flags=0x17 2017-11-07 09:45:31.881720 7f985b34f700 10 moving gn1-pf.rgw.data.root++XXX-mon-bucket to cache LRU end 2017-11-07 09:45:31.881733 7f985b34f700 10 distributing notification oid=notify.1 bl.length()=372 As you can see, for OID 'gn1-pf.rgw.data.root++.bucket.meta.XXX-mon-bucket:eb32b1ca-807a-4867-aea5-ff43ef7647c6.14977556.9' the cache notify failed, but for 'gn1-pf.rgw.data.root++XXX-mon-bucket' it went just fine. Skimming through the logs I see that notifies fail when one of these objects is used: - notify.4 - notify.6 In total there are 8 notify objects in the 'control' pool: - notify.0 - notify.1 - notify.2 - notify.3 - notify.4 - notify.5 - notify.6 - notify.7 I don't know if that's something which might relate to it. I created this issue in the tracker: http://tracker.ceph.com/issues/22060 Wido > Yehuda > > > Is anybody else running into this issue? > > > > Wido > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com