Hi *,
while tracking down a different performance issue with CephFS
(creating tar balls from CephFS-based directories takes multiple times
as long as when backing up the same data from local disks, i.e. 56
hours instead of 7), we had a look at CephFS performance related to
the size of the MDS process.
Our Ceph cluster (Luminous 12.2.1) is using file-based OSDs, CephFS
data is on SAS HDDs, meta data is on SAS SSDs.
It came to mind that MDS memory consumption might cause the delays
with "tar". But while below results don't confirm this (it actually
confirms that MDS memory size does not affect CephFS read speed when
the cache is sufficiently warm), it does show an almost 30%
performance drop if the cache is filled with the wrong entries.
After a fresh process start, our MDS takes about 450k memory, with 56k
residual. I then start a tar run for 36 GB small files (which I had
also run a few minutes before MDS restart, to warm up disk caches):
--- cut here ---
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1233 ceph 20 0 446584 56000 15908 S 3.960 0.085
0:01.08 ceph-mds
server01:~ # date; tar -C /srv/cephfs/prod/fileshare/stuff/ -cf- . |
wc -c; date
Wed Nov 29 17:38:21 CET 2017
38245529600
Wed Nov 29 17:44:27 CET 2017
server01:~ #
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1233 ceph 20 0 485760 109156 16148 S 0.331 0.166
0:10.76 ceph-mds
--- cut here ---
As you can see, there's only small growth in MDS virtual size.
The job took 366 seconds, that an average of about 100 MB/s.
I repeat that job a few minutes later, to get numbers with a
previously active MDS (the MDS cache should be warmed up now):
--- cut here ---
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1233 ceph 20 0 494976 118404 16148 S 2.961 0.180
0:16.21 ceph-mds
server01:~ # date; tar -C /srv/cephfs/prod/fileshare/stuff/ -cf- . |
wc -c; date
Wed Nov 29 17:53:09 CET 2017
38245529600
Wed Nov 29 17:58:53 CET 2017
server01:~ #
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1233 ceph 20 0 508288 131368 16148 S 1.980 0.200
0:25.45 ceph-mds
--- cut here ---
The job took 344 seconds, that's an average of about 106 MB/s. With
only a single run per situation, these numbers aren't more than rough
estimate, of course.
At 18:00:00, a file-based incremental backup job kicks in, which reads
through most of the files on the CephFS, but only backing up those
that were changed since the last run. This has nothing to do with our
"tar" and is running on a different node, where CephFS is
kernel-mounted as well. That backup job makes the MDS cache grow
drastically, you can see MDS at more than 8 GB now.
We then start another tar job (or rather two, to account for MDS
caching), as before:
--- cut here ---
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1233 ceph 20 0 8644776 7.750g 16184 S 0.990 12.39
6:45.24 ceph-mds
server01:~ # date; tar -C /srv/cephfs/prod/fileshare/stuff/ -cf- . |
wc -c; date
Wed Nov 29 18:13:20 CET 2017
38245529600
Wed Nov 29 18:21:50 CET 2017
server01:~ # date; tar -C /srv/cephfs/prod/fileshare/stuff/ -cf- . |
wc -c; date
Wed Nov 29 18:22:52 CET 2017
38245529600
Wed Nov 29 18:28:28 CET 2017
server01:~ #
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1233 ceph 20 0 8761512 7.642g 16184 S 3.300 12.22
7:03.52 ceph-mds
--- cut here ---
The second run is even a bit quicker than the "warmed-up" run with the
only partially filled cache (336 seconds, that's 108,5 MB/s).
But the run against the filled-up MDS cache, where most (if not all)
entries are no match for our tar lookups, took 510 seconds - that 71,5
MB/s, instead of the roughly 100 MB/s when the cache was empty.
This is by far no precise benchmark test, indeed. But it at least
seems to be an indicator that MDS cache misses are costly. (During the
tests, only small amounts of changes in CephFS were likely -
especially compared to the amount of reads and file lookups for their
metadata.)
Regards,
Jens
PS: Why so much memory for MDS in the first place? Because during
those (hourly) incremental backup runs, we got a large number of MDS
warnings about insufficient cache pressure responses from clients.
Increasing the MDS cache size did help to avoid these.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com