I experienced the same issue and was able to reduce metadata writes by raising mds_log_events_per_segment to it’s original value multiplied several times.
From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Nicolas Huillard <nhuillard@xxxxxxxxxxx>
Sent: Monday, March 19, 2018 12:01:09 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Huge amount of cephfs metadata writes while only reading data (rsync from storage, to single disk)
Sent: Monday, March 19, 2018 12:01:09 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Huge amount of cephfs metadata writes while only reading data (rsync from storage, to single disk)
Hi all,
I'm experimenting with a new little storage cluster. I wanted to take
advantage of the week-end to copy all data (1TB, 10M objects) from the
cluster to a single SATA disk. I expected to saturate the SATA disk
while writing to it, but the storage cluster actually saturates its
network links, while barely writing to the destination disk (63GB
written in 20h, that's less than 1MBps).
Setup : 2 datacenters × 3 storage servers × 2 disks/OSD each, Luminous
12.2.4 on Debian stretch, 1Gbps shared network, 200Mbps fibre link
between datacenters (12ms latency). 4 clients using a single cephfs
storing data + metadata on the same spinning disks with bluestore.
Test : I'm using a single rsync on one of the client servers (the other
3 are just sitting there). rsync is local to the client, copying from
the cephfs mount (kernel client on 4.14 from stretch-backports, just to
use a potentially more recent cephfs client than on stock 4.9), to the
SATA disk. The rsync'ed tree consists of lots a tiny files (1-3kB) on
deep directory branches, along with some large files (10-100MB) in a
few directories. There is no other activity on the cluster.
Observations : I initially saw write performance on the destination
disk from a few 100kBps (during exploration of branches with tiny file)
to a few 10MBps (while copying large files), essentially seeing the
file names scrolling at a relatively fixed rate, unrelated to their
individual size.
After 5 hours, the fibre link stated to saturate at 200Mbps, while
destination disk writes is down to a few 10kBps.
Using the dashboard, I see lots of metadata writes, at 30MBps rate on
the metadata pool, which correlates to the 200Mbps link rate.
It also shows regular "Health check failed: 1 MDSs behind on trimming
(MDS_TRIM)" / "MDS health message (mds.2): Behind on trimming (64/30)".
I wonder why cephfs would write anything to the metadata (I'm mounting
on the clients with "noatime"), while I'm just reading data from it...
What could I tune to reduce that write-load-while-reading-only ?
--
Nicolas Huillard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
I'm experimenting with a new little storage cluster. I wanted to take
advantage of the week-end to copy all data (1TB, 10M objects) from the
cluster to a single SATA disk. I expected to saturate the SATA disk
while writing to it, but the storage cluster actually saturates its
network links, while barely writing to the destination disk (63GB
written in 20h, that's less than 1MBps).
Setup : 2 datacenters × 3 storage servers × 2 disks/OSD each, Luminous
12.2.4 on Debian stretch, 1Gbps shared network, 200Mbps fibre link
between datacenters (12ms latency). 4 clients using a single cephfs
storing data + metadata on the same spinning disks with bluestore.
Test : I'm using a single rsync on one of the client servers (the other
3 are just sitting there). rsync is local to the client, copying from
the cephfs mount (kernel client on 4.14 from stretch-backports, just to
use a potentially more recent cephfs client than on stock 4.9), to the
SATA disk. The rsync'ed tree consists of lots a tiny files (1-3kB) on
deep directory branches, along with some large files (10-100MB) in a
few directories. There is no other activity on the cluster.
Observations : I initially saw write performance on the destination
disk from a few 100kBps (during exploration of branches with tiny file)
to a few 10MBps (while copying large files), essentially seeing the
file names scrolling at a relatively fixed rate, unrelated to their
individual size.
After 5 hours, the fibre link stated to saturate at 200Mbps, while
destination disk writes is down to a few 10kBps.
Using the dashboard, I see lots of metadata writes, at 30MBps rate on
the metadata pool, which correlates to the 200Mbps link rate.
It also shows regular "Health check failed: 1 MDSs behind on trimming
(MDS_TRIM)" / "MDS health message (mds.2): Behind on trimming (64/30)".
I wonder why cephfs would write anything to the metadata (I'm mounting
on the clients with "noatime"), while I'm just reading data from it...
What could I tune to reduce that write-load-while-reading-only ?
--
Nicolas Huillard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com