Re: Huge amount of cephfs metadata writes while only reading data (rsync from storage, to single disk)

Nicolas Huillard <nhuillard@xxxxxxxxxxx> · Mon, 19 Mar 2018 13:20:54 +0100

Le lundi 19 mars 2018 à 10:01 +0000, Sergey Malinin a écrit :
> I experienced the same issue and was able to reduce metadata writes
> by raising mds_log_events_per_segment to
> it’s original value multiplied several times.

I changed it from 1024 to 4096 :
* rsync status (1 line per file) scrolls much quicker
* OSD writes on the dashboard is much lower than reads now (it was much
higher before)
* metadata pool write rate in the 20-800kBps range now, while metadata
reads in the 20-80kBps
* data pool reads is in the hundreds of kBps, which still seems very
low
* destination disk write rate is a bit larger than the data pool read
rate (expected for btrfs), but still low
* inter-DC network load is now 1-50Mbps

I'll monitor the Munin graphs in the long run.

I can't find any doc about that mds_log_events_per_segment setting,
specially on how to choose a good value.
Can you elaborate on "original value multiplied several times" ?

I'm just seeing more MDS_TRIM warnings now. Maybe restarting the MDSs
just delayed re-emergence of the initial problem.

> ________________________________
> From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of
> Nicolas Huillard <nhuillard@xxxxxxxxxxx>
> Sent: Monday, March 19, 2018 12:01:09 PM
> To: ceph-users@xxxxxxxxxxxxxx
> Subject:  Huge amount of cephfs metadata writes while
> only reading data (rsync from storage, to single disk)
> 
> Hi all,
> 
> I'm experimenting with a new little storage cluster. I wanted to take
> advantage of the week-end to copy all data (1TB, 10M objects) from
> the
> cluster to a single SATA disk. I expected to saturate the SATA disk
> while writing to it, but the storage cluster actually saturates its
> network links, while barely writing to the destination disk (63GB
> written in 20h, that's less than 1MBps).
> 
> Setup : 2 datacenters × 3 storage servers × 2 disks/OSD each,
> Luminous
> 12.2.4 on Debian stretch, 1Gbps shared network, 200Mbps fibre link
> between datacenters (12ms latency). 4 clients using a single cephfs
> storing data + metadata on the same spinning disks with bluestore.
> 
> Test : I'm using a single rsync on one of the client servers (the
> other
> 3 are just sitting there). rsync is local to the client, copying from
> the cephfs mount (kernel client on 4.14 from stretch-backports, just
> to
> use a potentially more recent cephfs client than on stock 4.9), to
> the
> SATA disk. The rsync'ed tree consists of lots a tiny files (1-3kB) on
> deep directory branches, along with some large files (10-100MB) in a
> few directories. There is no other activity on the cluster.
> 
> Observations : I initially saw write performance on the destination
> disk from a few 100kBps (during exploration of branches with tiny
> file)
> to a few 10MBps (while copying large files), essentially seeing the
> file names scrolling at a relatively fixed rate, unrelated to their
> individual size.
> After 5 hours, the fibre link stated to saturate at 200Mbps, while
> destination disk writes is down to a few 10kBps.
> 
> Using the dashboard, I see lots of metadata writes, at 30MBps rate on
> the metadata pool, which correlates to the 200Mbps link rate.
> It also shows regular "Health check failed: 1 MDSs behind on trimming
> (MDS_TRIM)" / "MDS health message (mds.2): Behind on trimming
> (64/30)".
> 
> I wonder why cephfs would write anything to the metadata (I'm
> mounting
> on the clients with "noatime"), while I'm just reading data from
> it...
> What could I tune to reduce that write-load-while-reading-only ?
> 
> --
> Nicolas Huillard
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- 
Nicolas Huillard
Associé fondateur - Directeur Technique - Dolomède

nhuillard@xxxxxxxxxxx
Fixe : +33 9 52 31 06 10
Mobile : +33 6 50 27 69 08
http://www.dolomede.fr/

https://reseauactionclimat.org/planetman/
http://climat-2020.eu/
http://www.350.org/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com