On 11/04/2015 09:49 AM, Daniel Schneller wrote: > We had a similar issue in Firefly, where we had a very large number > (about 1.500.000) of buckets for a single RGW user. We observed a number > of slow requests in day-to-day use, but did not think much of it at the > time. > > At one point the primary OSD managing the list of buckets for that user > crashed and could not restart, because processing the tremendous amount > of buckets on startup - which also seemed to be single-threaded, judging > by to 100% CPU usage we could see - took longer than the > suicide-timeout. That lead to this OSD crashing again, and again. > Eventually, it would be marked out and the secondary tried to process > the list with the same result, leading to a cascading failure. > > While I am quite certain it is a different code path in your case (you > speak about a handful of buckets), it certainly sounds like the a very > similar issue. Do you have lots of objects in those few buckets, or are > they few, but large in size to reach the 30TB? Worst case you might be > in for a similar procedure as we had to take: Take load off the cluster, > increase the timeouts to ridiculous levels and copy the data over into a > more evenly distributed set of buckets (users in our case). Fortunately > as long as we did not try to write to the problematic buckets, we could > still read from them. > If you have a high amount of objects in a bucket you might want to give the new sharding feature a try. The bucket Index objects can become very large otherwise. Sharding is available since Hammer. Wido > Please notice that this is only a guess, I could be completely wrong. > > Daniel > > On 2015-11-03 13:33:19 +0000, Gerd Jakobovitsch said: > >> Dear all, >> >> I have a cluster running hammer (0.94.5), with 5 nodes. The main usage >> is for S3-compatible object storage. >> I am getting to a very troublesome problem at a ceph cluster. A single >> object in the .rgw.buckets.index is not responding to request and >> takes a very long time while recovering after an osd restart. During >> this time, the OSDs where this object is mapped got heavily loaded, >> with high cpu as well as memory usage. At the same time, the directory >> /var/lib/ceph/osd/ceph-XX/current/omap gets a large number of entries >> ( > 10000), that won't decrease. >> >> Very frequently, I get >100 blocked requests for this object, and the >> main OSD that stores it ends up accepting no other requests. Very >> frequently the OSD ends up crashing due to filestore timeout, and >> getting it up again is very troublesome - it usually has to run alone >> in the node for a long time, until the object gets recovered, somehow. >> >> At the OSD logs, there are several entries like these: >> -7051> 2015-11-03 10:46:08.339283 7f776974f700 10 log_client logged >> 2015-11-03 10:46:02.942023 osd.63 10.17.0.9:6857/2002 41 : cluster >> [WRN] slow re >> quest 120.003081 seconds old, received at 2015-11-03 10:43:56.472825: >> osd_repop(osd.53.236531:7 34.7 >> 8a7482ff/.dir.default.198764998.1/head//34 v 2369 >> 84'22) currently commit_sent >> >> >> 2015-11-03 10:28:32.405265 7f0035982700 0 log_channel(cluster) log >> [WRN] : 97 slow requests, 1 included below; oldest blocked for > >> 2046.502848 secs >> 2015-11-03 10:28:32.405269 7f0035982700 0 log_channel(cluster) log >> [WRN] : slow request 1920.676998 seconds old, received at 2015-11-03 >> 09:56:31.7282 >> 24: osd_op(client.210508702.0:14696798 .dir.default.198764998.1 [call >> rgw.bucket_prepare_op] 15.8a7482ff ondisk+write+known_if_redirected >> e236956) cur >> rently waiting for blocked object >> >> Is there any way to go deeper into this problem, or to rebuild the >> .rgw index without loosing data? I currently have 30 TB of data in the >> cluster - most of it concentrated in a handful of buckets - that I >> can't loose. >> >> Regards. >> -- >> >> >> >> >> >> >> >> >> >> -- >> >> As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas >> pelo sigilo legal e por direitos autorais. A divulgação, distribuição, >> reprodução ou qualquer forma de utilização do teor deste documento >> depende de autorização do emissor, sujeitando-se o infrator às sanções >> legais. Caso esta comunicação tenha sido recebida por engano, favor >> avisar imediatamente, respondendo esta mensagem. >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com