Dear Gregory and Philip, I'm also experimenting with a replicated primary data pool and an erasure-coded secondary data pool. I make the same observation with regards to objects and activity as Philip. However, is does seem to make a difference. If I run a very aggressive fio test as in: fio --ioengine=libaio --direct=1 --name=test --filename=test --bs=4k --size=100G --runtime=5m --readwrite=randwrite -iodepth=4096 or iodepth even higher, I observe "slow metadata IOs" on an fs with meta data on replicated ssd pool and just a primary EC data pool. On the other hand, I do not observe "slow metadata IOs" on an fs with the three-pool layout. In both cases I observe "slow ops" though. This result would indicate that the replicated primary data pool in front of the EC secondary data pool does indeed have an effect. Strangely though, I cannot see any activity on this pool with pool stats and neither are there any objects. Is there any way to check if anything is on this pool and how much storage it uses? "Ceph df" is not helping and neither is "rados ls", which is a bit of an issue when it comes to sizing. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Gregory Farnum <gfarnum@xxxxxxxxxx> Sent: 28 January 2020 18:13:29 To: CASS Philip Cc: ceph-users@xxxxxxx Subject: Re: CephFS - objects in default data pool On Tue, Jan 28, 2020 at 4:26 PM CASS Philip <p.cass@xxxxxxxxxxxxx<mailto:p.cass@xxxxxxxxxxxxx>> wrote: I have a query about https://docs.ceph.com/docs/master/cephfs/createfs/: “The data pool used to create the file system is the “default” data pool and the location for storing all inode backtrace information, used for hard link management and disaster recovery. For this reason, all inodes created in CephFS have at least one object in the default data pool.” This does not match my experience (nautilus servers, nautlius FUSE client or Centos 7 kernel client). I have a cephfs with a replicated top-level pool and a directory set to use erasure coding with setfattr, though I also did the same test using the subvolume commands with the same result. "Ceph df detail" shows no objects used in the top level pool, as shown in https://gist.github.com/pcass-epcc/af24081cf014a66809e801f33bcb535b (also displayed in-line below) Hmm I think this is tripping over the longstanding issue that omap data is not reflected in the pool stats (although I would expect it to still show up as objects, but perhaps the "ceph df" view has a different reporting chain? Or else I'm confused somehow.) But anyway... It would be useful if indeed clients didn’t have to write to the top-level pool, since that would mean we could give different clients permission only to pool-associated subdirectories without giving everyone write access to a pool with data structures shared between all users of the filesystem. *Clients* don't need write permission to the default data pool unless you want them to write files there. The backtraces are maintained by the MDS. :) -Greg [root@hdr-admon01 ec]# ceph df detail; ceph fs ls; ceph fs status RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 3.3 PiB 3.3 PiB 32 TiB 32 TiB 0.95 nvme 2.9 TiB 2.9 TiB 504 MiB 2.5 GiB 0.08 TOTAL 3.3 PiB 3.3 PiB 32 TiB 32 TiB 0.95 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR cephfs.fs1.metadata 5 162 MiB 63 324 MiB 0.01 1.4 TiB N/A N/A 63 0 B 0 B cephfs.fs1-replicated.data 6 0 B 0 0 B 0 1.0 PiB N/A N/A 0 0 B 0 B cephfs.fs1-ec.data 7 8.0 GiB 2.05k 11 GiB 0 2.4 PiB N/A N/A 2.05k 0 B 0 B name: fs1, metadata pool: cephfs.fs1.metadata, data pools: [cephfs.fs1-replicated.data cephfs.fs1-ec.data ] fs1 - 4 clients === +------+--------+------------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+------------+---------------+-------+-------+ | 0 | active | hdr-meta02 | Reqs: 0 /s | 29 | 16 | +------+--------+------------+---------------+-------+-------+ +----------------------------+----------+-------+-------+ | Pool | type | used | avail | +----------------------------+----------+-------+-------+ | cephfs.fs1.metadata | metadata | 324M | 1414G | | cephfs.fs1-replicated.data | data | 0 | 1063T | | cephfs.fs1-ec.data | data | 11.4G | 2505T | +----------------------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | hdr-meta01 | +-------------+ MDS version: ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable) [root@hdr-admon01 ec]# ll /test-fs/ec/ total 12582912 -rw-r--r--. 1 root root 4294967296 Jan 27 22:26 new-file -rw-r--r--. 2 root root 4294967296 Jan 28 14:06 new-file2 -rw-r--r--. 2 root root 4294967296 Jan 28 14:06 new-file-same-inode-as-newfile2 Regards, Phil _________________________________________ Philip Cass HPC Systems Specialist – Senior Systems Administrator EPCC [cid:16fed2141935b16b21] Advanced Computing Facility Bush Estate Penicuik Tel: +44 (0)131 4457815 Email: p.cass@xxxxxxxxxxxxx<mailto:p.cass@xxxxxxxxxxxxx> _________________________________________ The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. If you have received this message in error, please delete it and notify the originator immediately. Please consider the environment before printing this email. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx