Dear Konstantin and Patrick, thanks! I started migrating a 2-pool layout ceph fs (rep meta, EC default data) to a 3-pool layout (rep meta, rep default data, EC data set at "/") and use sub-directory mounts for data migration. So far, everything as it should. Maybe some background info for everyone who is reading this. The reason for migrating is the modified best practices for cephfs, compare these two: https://docs.ceph.com/docs/mimic/cephfs/createfs/#creating-pools https://docs.ceph.com/docs/master/cephfs/createfs/#creating-pools The 3-pool layout was never mentioned in the RH ceph-course I took, nor by any of the ceph consultants we hired before deploying ceph. However, it seems really important to know about it. For a meta data + data pool layout, since some meta-data is written to the default data pool, an EC default data pool seems a bad idea most of the time. I see a lot of size-0 objects that only store rados meta data: POOLS: NAME ID USED %USED MAX AVAIL OBJECTS con-fs2-meta1 12 256 MiB 0.02 1.1 TiB 410910 con-fs2-meta2 13 0 B 0 355 TiB 5217644 con-fs2-data 14 50 TiB 5.53 852 TiB 17943209 con-fs2-meta is the default data pool. This is probably the worst workload for an EC pool. On our file system I have regularly seen "one MDS reports slow meta-data IOs" and was always wondering where this comes from. I have the meta-data pool on SSDs and this warning simply didn't make any sense. Now I know. Having a small replicated default pool resolves not only this issue, it also speeds up file create/delete and hard-link operations dramatically. I guess, anything that modifies an inode. I never tested these operations in my benchmarks, but they are important. Compiling and installing packages etc., anything with heavy create/modify/delete workload will profit as well as cluster health. Fortunately, I had an opportunity to migrate the ceph fs. For anyone who starts new, I would recommend to have the 3-pool layout right from the beginning. Never use an EC pool as the default data pool. I would even make this statement a bit stronger in the ceph documentation: If erasure-coded pools are planned for the file system, it is usually better to use a replicated pool for the default data pool ... to, for example, If erasure-coded pools are planned for the file system, it is strongly recommended to use a replicated pool for the default data pool ... Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Patrick Donnelly <pdonnell@xxxxxxxxxx> Sent: 02 February 2020 12:41 To: Frank Schilder Cc: ceph-users Subject: Re: ceph fs dir-layouts and sub-directory mounts On Wed, Jan 29, 2020 at 3:04 AM Frank Schilder <frans@xxxxxx> wrote: > > I would like to (in this order) > > - set the data pool for the root "/" of a ceph-fs to a custom value, say "P" (not the initial data pool used in fs new) > - create a sub-directory of "/", for example "/a" > - mount the sub-directory "/a" with a client key with access restricted to "/a" > > The client will not be able to see the dir layout attribute set at "/", its not mounted. The client gets the file layout information when the file is created (i.e. the RPC response from the MDS) . It doesn't have _any_ access to "/". It can't even stat "/". > Will the data of this client still go to the pool "P", that is, does "/a" inherit the dir layout transparently to the client when following the steps above? Yes. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx