Re: cephfs client not released caps when running rsync

"Alexander E. Patrakov" <patrakov@xxxxxxxxx> · Tue, 26 Mar 2024 23:45:04 +0800

Hello Nikita,

A valid workaround is to export both instances of CephFS via
NFS-Ganesha and run rsync on NFS, not on CephFS directly.

On Tue, Mar 26, 2024 at 10:15 PM Nikita Borisenkov
<n.borisenkov@xxxxxxxxxxxxxx> wrote:
>
> We transfer data (300 million small files) using rsync between cephfs
> from version 12.2.13 to 18.2.1. After about the same time (about 7 hours
> in this case), copying stops for a minute
>
> ```
> health:HEALTH_WARN
>           1 clients failing to advance oldest client/flush tid
>           1 MDSs report slow metadata IOs
>           1 MDSs behind on trimming
> ```
>
> The only interesting messages in the logs are:
> ```
> ceph-mds[640738]: mds.beacon.cephfs.X missed beacon ack from the monitors
> ```
>
> I watched debugging on the client (destination) via watch -n1
> /sys/kernel/debug/ceph/451eea44-d7a0-11ee-9117-b496914b4c02.client32497
> ```
> item total
> ------------------------------------------
> opened files / total inodes 1 / 866757
> pinned i_caps / total inodes 866757 / 866757
> opened inodes / total inodes 1 / 866757
>
> item total avg_lat(us) min_lat(us) max_lat(us) stdev(us)
> --------------------------------------------------
> ---------------------------------
> read 0 0 0 0 0
> write 5129252 194689 9080 54693415 1023
> metadata 29045143 670 161 87794124 369
>
> item total miss hit
> -------------------------------------------------
> d_lease 5361 3026 315488442
> caps 866757 11 483177478
> ```
> During the copy, "pinned i_caps / total inodes" are gradually increased
> until it reaches the value "mds_max_caps_per_client" (default: 1Mi).
> Then "pinned i_caps / total inodes" begins to decrease to almost 0, at
> which time HEALTH_WARN appears and transfer stops. "op/s wr" increases
> from 200 to 1.5k. Then total inodes begin to increase again along with
> the resumption of copying and the cluster goes into the HEALTHY state.
>
> Mount options:
> /mnt/cephfs-old 10.77.12.90:6789,10.77.12.91:6789,10.77.12.92:6789:/
> ceph rw,noatime,nodiratime,name=admin,secret=<hidden>,acl
> /mnt/cephfs-new 10.77.12.139:6789,10.77.12.140:6789,10.77.12.141:6789:/
> ceph rw,noatime,nodiratime,name=admin,secret=<hidden>,acl,caps_max=10000
>
> Client properties on the MDS server (removed unnecessary):
> ceph daemon mds.cephfs.X client ls 32497
> [
>       {
>       ...
>           "id": 32497,
>           "state": "open",
>           "num_leases": 0,
>           "num_caps": 980679,
>           "request_load_avg": 7913,
>           "requests_in_flight": 466,
>           "num_completed_flushes": 464,
>           "recall_caps": {
>               "value": 0,
>               "halflife": 60
>           },
>           "release_caps": {
>               "value": 1732.2552002208533,
>               "halflife": 60
>           },
>           "recall_caps_throttle": {
>               "value": 0,
>               "halflife": 1.35000000000000001
>           },
>           "recall_caps_throttle2o": {
>               "value": 0,
>               "halflife": 0.5
>           },
>           "session_cache_liveness": {
>               "value": 42186.620275326415,
>               "halflife": 300
>           },
>           "cap_acquisition": {
>               "value": 0,
>               "halflife": 30
>           },
>           ...
>       }
> ]
>
> ceph daemonperf mds.cephfs.X
> ```
> -------------------------------------------mds------
> ------------------------------------- --mds_cache--- ------mds_log
> ------------ -mds_mem- -------mds_server------- mds_ -----objecter------
> purg
> req rlat slr fwd inos caps exi imi hifc crev cgra ctru cfsa cfa hcc hccd
> hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr hcs hsr cre cat
> |sess|actv rd wr rdwr|purg|
> 114 0 0 0 1.9M 438k 0 0 0 0 253 0 0 0 0 0 0 59 | 0 0 0 |128 123k 129 0
> |1.3M 1.9M|114 0 0 0 0 | 3 | 0 0 440 0 | 0
> 101 0 0 0 1.9M 438k 0 0 0 0 0 0 0 0 0 0 0 53 | 0 0 0 |106 123k 129 0
> |1.3M 1.9M|101 0 0 0 0 | 3 | 0 0 0 0 | 0
> ...
> ``` - from this output it is clear that the client does not send cap
> release at all (column hccr, decoding of the columns "ceph daemon mds.X
> perf schema")
>
> Then I was able to find the right query to google similar problems:
> https://www.spinics.net/lists/ceph-users/msg50573.html
> https://ceph-users.ceph.narkive.com/mcyPtEyz/rsync-kernel-client-cepfs-mkstemp-no-space-left-on-device
> https://www.spinics.net/lists/ceph-users/msg50158.html
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/
>
> It turns out that the problem is in rsync, in the way it works?
>
> The only "solution" is to do it on the client according to a schedule
> (or upon reaching a certain number of open caps) “echo 2 >
> /proc/sys/vm/drop_caches”. After this command, the cephfs client
> releases the cached caps. And if there were a lot of them, then MDS
> becomes slow again.
>
> We also tried to mount cephfs with the option "caps_max=10000" so that
> the client would do a forced release when the specified value is
> reached, but this did not help.
>
> We can limit mds_max_caps_per_client (not tested), but this also affects
> all clients at once.
>
> The command "ceph daemon mds.cephfs.X cache drop" (with or without an
> additional parameter) does not help
>
> Tested on Linux kernels (client side): 5.10 and 6.1
>
> Did I understand everything correctly? is this the expected behavior
> when running rsync?
>
>
> And one more problem (I don’t know if it’s related or not), when rsync
> finishes copying, all caps are freed except the last two (pinned i_caps
> / total inodes 2 / 2)
>
> At this moment a warning appears (or remains after releasing a large
> number of caps): 1 clients failing to advance oldest client/flush tid
> But then it doesn't disappear. I waited 12 hours.
> Warning disappears only after executing the "sync" command on the
> client. and in the client metrics you can see "pinned i_caps / total
> inodes 1 / 1"
>
> Note: running "echo 2 > /proc/sys/vm/drop_caches" does not help in this
> case.
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Alexander E. Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx