On Mon, Dec 21, 2015 at 4:15 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: > > > On Mon, Dec 21, 2015 at 10:55 PM, Florian Haas <florian@xxxxxxxxxxx> wrote: >> >> On Mon, Dec 21, 2015 at 3:35 PM, Haomai Wang <haomai@xxxxxxxx> wrote: >> > >> > >> > On Fri, Dec 18, 2015 at 1:16 AM, Florian Haas <florian@xxxxxxxxxxx> >> > wrote: >> >> >> >> Hey everyone, >> >> >> >> I recently got my hands on a cluster that has been underperforming in >> >> terms of radosgw throughput, averaging about 60 PUTs/s with 70K >> >> objects where a freshly-installed cluster with near-identical >> >> configuration would do about 250 PUTs/s. (Neither of these values are >> >> what I'd consider high throughput, but this is just to give you a feel >> >> about the relative performance hit.) >> >> >> >> Some digging turned up that of the less than 200 buckets in the >> >> cluster, about 40 held in excess of a million objects (1-4M), which >> >> one bucket being an outlier with 45M objects. All buckets were created >> >> post-Hammer, and use 64 index shards. The total number of objects in >> >> radosgw is approx. 160M. >> >> >> >> Now this isn't a large cluster in terms of OSD distribution; there are >> >> only 12 OSDs (after all, we're only talking double-digit terabytes >> >> here). In almost all of these OSDs, the LevelDB omap directory has >> >> grown to a size of 10-20 GB. >> >> >> >> So I have several questions on this: >> >> >> >> - Is it correct to assume that such a large LevelDB would be quite >> >> detrimental to radosgw performance overall? >> >> >> >> - If so, would clearing that one large bucket and distributing the >> >> data over several new buckets reduce the LevelDB size at all? >> >> >> >> - Is there even something akin to "ceph mon compact" for OSDs? >> >> >> >> - Are these large LevelDB databases a simple consequence of having a >> >> combination of many radosgw objects and few OSDs, with the >> >> distribution per-bucket being comparatively irrelevant? >> >> >> >> I do understand that the 45M object bucket itself would have been a >> >> problem pre-Hammer, with no index sharding available. But with what >> >> others have shared here, a rule of thumb of one index shard per >> >> million objects should be a good one to follow, so 64 shards for 45M >> >> objects doesn't strike me as totally off the mark. That's why I think >> >> LevelDB I/O is actually the issue here. But I might be totally wrong; >> >> all insights appreciated. :) >> > >> > >> > Do you enable bucket index sharding? >> >> As stated above, yes. 64 shards. >> >> > I'm not sure your bottleneck regard to your cluster, I guess you could >> > disable leveldb compression to test whether reduce compaction influence. >> >> Hmmm, you mean with "leveldb_compression = false"? Could you explain >> why exactly *disabling* compression would help with large omaps? >> >> Also, would "osd_compact_leveldb_on_mount" (undocumented) help here? >> It looks to me like that is an option with no actual implementing >> code, but I may be missing something. >> >> The similarly named leveldb_compact_on_mount seems to only compact >> LevelDB data in LevelDBStore. But I may be mistaken there too, as that >> option also seems to be undocumented. Would configuring an osd with >> leveldb_compact_on_mount=true do omap compaction on OSD daemon >> startup, in a FileStore OSD? > > > I don't have exact info to sure this is the problem for your case, before I > met this problem and because leveldb own single compaction thread which > consume lots of time on compress/uncompress to do compaction. > > what's your version, I guess "leveldb_compression" or > "osd_leveldb_compression" can help This is on Hammer. Could you please clarify the semantics of leveldb_compact_on_mount and leveldb_compression for OSDs though? Like I said, it looks like neither of those options is documented anywhere. Cheers, Florian _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com