Re: CephFS - objects in default data pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Gregory and Philip,

I'm also experimenting with a replicated primary data pool and an erasure-coded secondary data pool. I make the same observation with regards to objects and activity as Philip. However, is does seem to make a difference. If I run a very aggressive fio test as in:

fio --ioengine=libaio --direct=1 --name=test --filename=test --bs=4k --size=100G --runtime=5m --readwrite=randwrite -iodepth=4096

or iodepth even higher, I observe "slow metadata IOs" on an fs with meta data on replicated ssd pool and just a primary EC data pool. On the other hand, I do not observe "slow metadata IOs" on an fs with the three-pool layout. In both cases I observe "slow ops" though.

This result would indicate that the replicated primary data pool in front of the EC secondary data pool does indeed have an effect. Strangely though, I cannot see any activity on this pool with pool stats and neither are there any objects.

Is there any way to check if anything is on this pool and how much storage it uses? "Ceph df" is not helping and neither is "rados ls", which is a bit of an issue when it comes to sizing.

Best regards,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Gregory Farnum <gfarnum@xxxxxxxxxx>
Sent: 28 January 2020 18:13:29
To: CASS Philip
Cc: ceph-users@xxxxxxx
Subject:  Re: CephFS - objects in default data pool

On Tue, Jan 28, 2020 at 4:26 PM CASS Philip <p.cass@xxxxxxxxxxxxx<mailto:p.cass@xxxxxxxxxxxxx>> wrote:
I have a query about https://docs.ceph.com/docs/master/cephfs/createfs/:

“The data pool used to create the file system is the “default” data pool and the location for storing all inode backtrace information, used for hard link management and disaster recovery. For this reason, all inodes created in CephFS have at least one object in the default data pool.”

This does not match my experience (nautilus servers, nautlius FUSE client or Centos 7 kernel client). I have a cephfs with a replicated top-level pool and a directory set to use erasure coding with setfattr, though I also did the same test using the subvolume commands with the same result.  "Ceph df detail" shows no objects used in the top level pool, as shown in https://gist.github.com/pcass-epcc/af24081cf014a66809e801f33bcb535b (also displayed in-line below)

Hmm I think this is tripping over the longstanding issue that omap data is not reflected in the pool stats (although I would expect it to still show up as objects, but perhaps the "ceph df" view has a different reporting chain? Or else I'm confused somehow.)
But anyway...


It would be useful if indeed clients didn’t have to write to the top-level pool, since that would mean we could give different clients permission only to pool-associated subdirectories without giving everyone write access to a pool with data structures shared between all users of the filesystem.

*Clients* don't need write permission to the default data pool unless you want them to write files there. The backtraces are maintained by the MDS. :)
-Greg


[root@hdr-admon01 ec]# ceph df detail; ceph fs ls; ceph fs status
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       3.3 PiB     3.3 PiB      32 TiB       32 TiB          0.95
    nvme      2.9 TiB     2.9 TiB     504 MiB      2.5 GiB          0.08
    TOTAL     3.3 PiB     3.3 PiB      32 TiB       32 TiB          0.95

POOLS:
    POOL                           ID     STORED      OBJECTS     USED        %USED     MAX AVAIL     QUOTA OBJECTS     QUOTA BYTES     DIRTY     USED COMPR     UNDER COMPR
    cephfs.fs1.metadata             5     162 MiB          63     324 MiB      0.01       1.4 TiB     N/A               N/A                63            0 B             0 B
    cephfs.fs1-replicated.data      6         0 B           0         0 B         0       1.0 PiB     N/A               N/A                 0            0 B             0 B
    cephfs.fs1-ec.data              7     8.0 GiB       2.05k      11 GiB         0       2.4 PiB     N/A               N/A             2.05k            0 B             0 B
name: fs1, metadata pool: cephfs.fs1.metadata, data pools: [cephfs.fs1-replicated.data cephfs.fs1-ec.data ]
fs1 - 4 clients
===
+------+--------+------------+---------------+-------+-------+
| Rank | State  |    MDS     |    Activity   |  dns  |  inos |
+------+--------+------------+---------------+-------+-------+
|  0   | active | hdr-meta02 | Reqs:    0 /s |   29  |   16  |
+------+--------+------------+---------------+-------+-------+
+----------------------------+----------+-------+-------+
|            Pool            |   type   |  used | avail |
+----------------------------+----------+-------+-------+
|    cephfs.fs1.metadata     | metadata |  324M | 1414G |
| cephfs.fs1-replicated.data |   data   |    0  | 1063T |
|     cephfs.fs1-ec.data     |   data   | 11.4G | 2505T |
+----------------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|  hdr-meta01 |
+-------------+
MDS version: ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable)

[root@hdr-admon01 ec]# ll /test-fs/ec/
total 12582912
-rw-r--r--. 1 root root 4294967296 Jan 27 22:26 new-file
-rw-r--r--. 2 root root 4294967296 Jan 28 14:06 new-file2
-rw-r--r--. 2 root root 4294967296 Jan 28 14:06 new-file-same-inode-as-newfile2

Regards,
Phil
_________________________________________
Philip Cass
HPC Systems Specialist – Senior Systems Administrator
EPCC

[cid:16fed2141935b16b21]
Advanced Computing Facility
Bush Estate
Penicuik

Tel:                                        +44 (0)131 4457815
Email:                                   p.cass@xxxxxxxxxxxxx<mailto:p.cass@xxxxxxxxxxxxx>

_________________________________________

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only.  If you have received this message in error, please delete it and notify the originator immediately.
Please consider the environment before printing this email.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux