Re: Huge amount of cephfs metadata writes while only reading data (rsync from storage, to single disk)

Sergey Malinin <hell@xxxxxxxxxxx> · Mon, 19 Mar 2018 10:01:30 +0000

I experienced the same issue and was able to reduce metadata writes by raising mds_log_events_per_segment to it’s original value multiplied several times.

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Nicolas Huillard <nhuillard@xxxxxxxxxxx>

Sent: Monday, March 19, 2018 12:01:09 PM

To: ceph-users@xxxxxxxxxxxxxx

Subject:  Huge amount of cephfs metadata writes while only reading data (rsync from storage, to single disk)

Hi all,

I'm experimenting with a new little storage cluster. I wanted to take

advantage of the week-end to copy all data (1TB, 10M objects) from the

cluster to a single SATA disk. I expected to saturate the SATA disk

while writing to it, but the storage cluster actually saturates its

network links, while barely writing to the destination disk (63GB

written in 20h, that's less than 1MBps).

Setup : 2 datacenters × 3 storage servers × 2 disks/OSD each, Luminous

12.2.4 on Debian stretch, 1Gbps shared network, 200Mbps fibre link

between datacenters (12ms latency). 4 clients using a single cephfs

storing data + metadata on the same spinning disks with bluestore.

Test : I'm using a single rsync on one of the client servers (the other

3 are just sitting there). rsync is local to the client, copying from

the cephfs mount (kernel client on 4.14 from stretch-backports, just to

use a potentially more recent cephfs client than on stock 4.9), to the

SATA disk. The rsync'ed tree consists of lots a tiny files (1-3kB) on

deep directory branches, along with some large files (10-100MB) in a

few directories. There is no other activity on the cluster.

Observations : I initially saw write performance on the destination

disk from a few 100kBps (during exploration of branches with tiny file)

to a few 10MBps (while copying large files), essentially seeing the

file names scrolling at a relatively fixed rate, unrelated to their

individual size.

After 5 hours, the fibre link stated to saturate at 200Mbps, while

destination disk writes is down to a few 10kBps.

Using the dashboard, I see lots of metadata writes, at 30MBps rate on

the metadata pool, which correlates to the 200Mbps link rate.

It also shows regular "Health check failed: 1 MDSs behind on trimming

(MDS_TRIM)" / "MDS health message (mds.2): Behind on trimming (64/30)".

I wonder why cephfs would write anything to the metadata (I'm mounting

on the clients with "noatime"), while I'm just reading data from it...

What could I tune to reduce that write-load-while-reading-only ?

-- 

Nicolas Huillard

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com