On Mon, 18 Apr 2016 11:46:18 -0700 Gregory Farnum wrote: > On Sun, Apr 17, 2016 at 9:05 PM, Christian Balzer <chibi@xxxxxxx> wrote: > > > > Hello, > > > > On Fri, 15 Apr 2016 08:20:45 +0200 Michael Metz-Martini | SpeedPartner > > GmbH wrote: > > > >> Hi, > >> > >> Am 15.04.2016 um 07:43 schrieb Christian Balzer: > >> > On Fri, 15 Apr 2016 07:02:13 +0200 Michael Metz-Martini | > >> > SpeedPartner GmbH wrote: > >> >> Am 15.04.2016 um 03:07 schrieb Christian Balzer: > >> >>>> We thought this was a good idea so that we can change the > >> >>>> replication size different for doc_root and raw-data if we like. > >> >>>> Seems this was a bad idea for all objects. > >> [...] > >> >>> If nobody else has anything to say about this, I'd consider > >> >>> filing a bug report. > >> >> Im must admit that we're currently using 0.87 (Giant) and haven't > >> >> upgraded so far. Would be nice to know if upgrade would "clean" > >> >> this state or we should better start with a new cluster ... :( > > > > Actually, I ran some more tests, with larger and differing data sets. > > > > I can now replicate this behavior here, before: > > --- > > NAME ID USED %USED MAX AVAIL OBJECTS > > data 0 6224M 0.11 1175G 1870 > > metadata 1 18996k 0 1175G 24 > > filegoats 10 468M 0 1175G 1346 > > --- > > > > And after copying /usr/ from the client were that CephFS is mounted to > > the directory mapped to "filegoats": > > --- > > data 0 6224M 0.11 1173G 47274 > > metadata 1 42311k 0 1173G 4057 > > filegoats 10 1642M 0.03 1173G 43496 > > --- > > > > So not a "bug" per se, but not exactly elegant when considering the > > object overhead. > > This feels a lot like how cache-tiering is implemented as well (evicted > > objects get zero'd, not deleted). > > > > I guess the best strategy here is do to have the vast majority of data > > in "data" and only special cases in other pools (like SSD based ones). > > > > Would be nice if somebody from the devs, RH could pipe up and the > > documentation updated to reflect this. > > It's not really clear to me what test you're running here. Create FS, with default metadata and data pool. Add another data pool (filegoats). Map (set layout) a subdirectory to that data pool. Copy lots of data (files) there. Find all those empty objects in "data", matching up with the actual data holding objects in "filegoats". > But if > you're talking about lots of empty RADOS objects, you're probably > running into the backtraces. Objects store (often stale) backtraces of > their directory path in an xattr for disaster recovery and lookup. But > to facilitate that lookup, they need to be visible without knowing > anything about the data placement, so if you hav ea bunch of files > elsewhere it still puts a pointer backtrace in the default file data > pool. That's obviously what's happening here. > Although I think we've talked about ways to avoid that and maybe did > something to improve it by Jewel, but I don't remember for certain. > Michael would be probably mostly interested in that, with 2.2billion of those empty objects that are significantly impacting performance. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com