Re: ceph fs dir-layouts and sub-directory mounts

Frank Schilder <frans@xxxxxx> · Mon, 3 Feb 2020 09:08:32 +0000

Dear Konstantin and Patrick,

thanks!

I started migrating a 2-pool layout ceph fs (rep meta, EC default data) to a 3-pool layout (rep meta, rep default data, EC data set at "/") and use sub-directory mounts for data migration. So far, everything as it should.

Maybe some background info for everyone who is reading this. The reason for migrating is the modified best practices for cephfs, compare these two:

https://docs.ceph.com/docs/mimic/cephfs/createfs/#creating-pools
https://docs.ceph.com/docs/master/cephfs/createfs/#creating-pools

The 3-pool layout was never mentioned in the RH ceph-course I took, nor by any of the ceph consultants we hired before deploying ceph. However, it seems really important to know about it.

For a meta data + data pool layout, since some meta-data is written to the default data pool, an EC default data pool seems a bad idea most of the time. I see a lot of size-0 objects that only store rados meta data:

POOLS:
    NAME                     ID     USED        %USED     MAX AVAIL     OBJECTS
    con-fs2-meta1            12     256 MiB	 0.02       1.1 TiB	  410910
    con-fs2-meta2            13         0 B         0       355 TiB	 5217644
    con-fs2-data             14      50 TiB	 5.53       852 TiB     17943209

con-fs2-meta is the default data pool. This is probably the worst workload for an EC pool.

On our file system I have regularly seen "one MDS reports slow meta-data IOs" and was always wondering where this comes from. I have the meta-data pool on SSDs and this warning simply didn't make any sense. Now I know.

Having a small replicated default pool resolves not only this issue, it also speeds up file create/delete and hard-link operations dramatically. I guess, anything that modifies an inode. I never tested these operations in my benchmarks, but they are important. Compiling and installing packages etc., anything with heavy create/modify/delete workload will profit as well as cluster health.

Fortunately, I had an opportunity to migrate the ceph fs. For anyone who starts new, I would recommend to have the 3-pool layout right from the beginning. Never use an EC pool as the default data pool. I would even make this statement a bit stronger in the ceph documentation:

    If erasure-coded pools are planned for the file system, it is usually better to use a replicated pool for the default data pool ...

to, for example,

    If erasure-coded pools are planned for the file system, it is strongly recommended to use a replicated pool for the default data pool ...

Best regards,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
Sent: 02 February 2020 12:41
To: Frank Schilder
Cc: ceph-users
Subject: Re:  ceph fs dir-layouts and sub-directory mounts

On Wed, Jan 29, 2020 at 3:04 AM Frank Schilder <frans@xxxxxx> wrote:
>
> I would like to (in this order)
>
> - set the data pool for the root "/" of a ceph-fs to a custom value, say "P" (not the initial data pool used in fs new)
> - create a sub-directory of "/", for example "/a"
> - mount the sub-directory "/a" with a client key with access restricted to "/a"
>
> The client will not be able to see the dir layout attribute set at "/", its not mounted.

The client gets the file layout information when the file is created
(i.e. the RPC response from the MDS) . It doesn't have _any_ access to
"/". It can't even stat "/".

> Will the data of this client still go to the pool "P", that is, does "/a" inherit the dir layout transparently to the client when following the steps above?

Yes.

--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx