Re: ceph fs dir-layouts and sub-directory mounts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



errata: con-fs2-meta2 is the default data pool.

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder
Sent: 03 February 2020 10:08
To: Patrick Donnelly; Konstantin Shalygin
Cc: ceph-users
Subject: Re:  ceph fs dir-layouts and sub-directory mounts

Dear Konstantin and Patrick,

thanks!

I started migrating a 2-pool layout ceph fs (rep meta, EC default data) to a 3-pool layout (rep meta, rep default data, EC data set at "/") and use sub-directory mounts for data migration. So far, everything as it should.

Maybe some background info for everyone who is reading this. The reason for migrating is the modified best practices for cephfs, compare these two:

https://docs.ceph.com/docs/mimic/cephfs/createfs/#creating-pools
https://docs.ceph.com/docs/master/cephfs/createfs/#creating-pools

The 3-pool layout was never mentioned in the RH ceph-course I took, nor by any of the ceph consultants we hired before deploying ceph. However, it seems really important to know about it.

For a meta data + data pool layout, since some meta-data is written to the default data pool, an EC default data pool seems a bad idea most of the time. I see a lot of size-0 objects that only store rados meta data:

POOLS:
    NAME                     ID     USED        %USED     MAX AVAIL     OBJECTS
    con-fs2-meta1            12     256 MiB      0.02       1.1 TiB       410910
    con-fs2-meta2            13         0 B         0       355 TiB      5217644
    con-fs2-data             14      50 TiB      5.53       852 TiB     17943209

con-fs2-meta2 is the default data pool. This is probably the worst workload for an EC pool.

On our file system I have regularly seen "one MDS reports slow meta-data IOs" and was always wondering where this comes from. I have the meta-data pool on SSDs and this warning simply didn't make any sense. Now I know.

Having a small replicated default pool resolves not only this issue, it also speeds up file create/delete and hard-link operations dramatically. I guess, anything that modifies an inode. I never tested these operations in my benchmarks, but they are important. Compiling and installing packages etc., anything with heavy create/modify/delete workload will profit as well as cluster health.

Fortunately, I had an opportunity to migrate the ceph fs. For anyone who starts new, I would recommend to have the 3-pool layout right from the beginning. Never use an EC pool as the default data pool. I would even make this statement a bit stronger in the ceph documentation:

    If erasure-coded pools are planned for the file system, it is usually better to use a replicated pool for the default data pool ...

to, for example,

    If erasure-coded pools are planned for the file system, it is strongly recommended to use a replicated pool for the default data pool ...

Best regards,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux