Re: Deprecating ext4 support

Christian Balzer <chibi@xxxxxxx> · Tue, 19 Apr 2016 09:00:30 +0900

On Mon, 18 Apr 2016 11:46:18 -0700 Gregory Farnum wrote:

> On Sun, Apr 17, 2016 at 9:05 PM, Christian Balzer <chibi@xxxxxxx> wrote:
> >
> > Hello,
> >
> > On Fri, 15 Apr 2016 08:20:45 +0200 Michael Metz-Martini | SpeedPartner
> > GmbH wrote:
> >
> >> Hi,
> >>
> >> Am 15.04.2016 um 07:43 schrieb Christian Balzer:
> >> > On Fri, 15 Apr 2016 07:02:13 +0200 Michael Metz-Martini |
> >> > SpeedPartner GmbH wrote:
> >> >> Am 15.04.2016 um 03:07 schrieb Christian Balzer:
> >> >>>> We thought this was a good idea so that we can change the
> >> >>>> replication size different for doc_root and raw-data if we like.
> >> >>>> Seems this was a bad idea for all objects.
> >> [...]
> >> >>> If nobody else has anything to say about this, I'd consider
> >> >>> filing a bug report.
> >> >> Im must admit that we're currently using 0.87 (Giant) and haven't
> >> >> upgraded so far. Would be nice to know if upgrade would "clean"
> >> >> this state or we should better start with a new cluster ... :(
> >
> > Actually, I ran some more tests, with larger and differing data sets.
> >
> > I can now replicate this behavior here, before:
> > ---
> >     NAME          ID     USED       %USED     MAX AVAIL     OBJECTS
> >     data          0       6224M      0.11         1175G        1870
> >     metadata      1      18996k         0         1175G          24
> >     filegoats     10       468M         0         1175G        1346
> > ---
> >
> > And after copying /usr/ from the client were that CephFS is mounted to
> > the directory mapped to "filegoats":
> > ---
> >     data          0       6224M      0.11         1173G       47274
> >     metadata      1      42311k         0         1173G        4057
> >     filegoats     10      1642M      0.03         1173G       43496
> > ---
> >
> > So not a "bug" per se, but not exactly elegant when considering the
> > object overhead.
> > This feels a lot like how cache-tiering is implemented as well (evicted
> > objects get zero'd, not deleted).
> >
> > I guess the best strategy here is do to have the vast majority of data
> > in "data" and only special cases in other pools (like SSD based ones).
> >
> > Would be nice if somebody from the devs, RH could pipe up and the
> > documentation updated to reflect this.
> 
> It's not really clear to me what test you're running here. 

Create FS, with default metadata and data pool.
Add another data pool (filegoats).
Map (set layout) a subdirectory to that data pool. 
Copy lots of data (files) there.
Find all those empty objects in "data", matching up with the actual data
holding objects in "filegoats". 

> But if
> you're talking about lots of empty RADOS objects, you're probably
> running into the backtraces. Objects store (often stale) backtraces of
> their directory path in an xattr for disaster recovery and lookup. But
> to facilitate that lookup, they need to be visible without knowing
> anything about the data placement, so if you hav ea bunch of files
> elsewhere it still puts a pointer backtrace in the default file data
> pool.
That's obviously what's happening here.

> Although I think we've talked about ways to avoid that and maybe did
> something to improve it by Jewel, but I don't remember for certain.
> 
Michael would be probably mostly interested in that, with 2.2billion of
those empty objects that are significantly impacting performance.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com