cephfs client not released caps when running rsync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We transfer data (300 million small files) using rsync between cephfs from version 12.2.13 to 18.2.1. After about the same time (about 7 hours in this case), copying stops for a minute

```
health:HEALTH_WARN
         1 clients failing to advance oldest client/flush tid
         1 MDSs report slow metadata IOs
         1 MDSs behind on trimming
```

The only interesting messages in the logs are:
```
ceph-mds[640738]: mds.beacon.cephfs.X missed beacon ack from the monitors
```

I watched debugging on the client (destination) via watch -n1
/sys/kernel/debug/ceph/451eea44-d7a0-11ee-9117-b496914b4c02.client32497
```
item total
------------------------------------------
opened files / total inodes 1 / 866757
pinned i_caps / total inodes 866757 / 866757
opened inodes / total inodes 1 / 866757

item total avg_lat(us) min_lat(us) max_lat(us) stdev(us)
-------------------------------------------------- ---------------------------------
read 0 0 0 0 0
write 5129252 194689 9080 54693415 1023
metadata 29045143 670 161 87794124 369

item total miss hit
-------------------------------------------------
d_lease 5361 3026 315488442
caps 866757 11 483177478
```
During the copy, "pinned i_caps / total inodes" are gradually increased until it reaches the value "mds_max_caps_per_client" (default: 1Mi). Then "pinned i_caps / total inodes" begins to decrease to almost 0, at which time HEALTH_WARN appears and transfer stops. "op/s wr" increases from 200 to 1.5k. Then total inodes begin to increase again along with the resumption of copying and the cluster goes into the HEALTHY state.

Mount options:
/mnt/cephfs-old 10.77.12.90:6789,10.77.12.91:6789,10.77.12.92:6789:/ ceph rw,noatime,nodiratime,name=admin,secret=<hidden>,acl /mnt/cephfs-new 10.77.12.139:6789,10.77.12.140:6789,10.77.12.141:6789:/ ceph rw,noatime,nodiratime,name=admin,secret=<hidden>,acl,caps_max=10000

Client properties on the MDS server (removed unnecessary):
ceph daemon mds.cephfs.X client ls 32497
[
     {
     ...
         "id": 32497,
         "state": "open",
         "num_leases": 0,
         "num_caps": 980679,
         "request_load_avg": 7913,
         "requests_in_flight": 466,
         "num_completed_flushes": 464,
         "recall_caps": {
             "value": 0,
             "halflife": 60
         },
         "release_caps": {
             "value": 1732.2552002208533,
             "halflife": 60
         },
         "recall_caps_throttle": {
             "value": 0,
             "halflife": 1.35000000000000001
         },
         "recall_caps_throttle2o": {
             "value": 0,
             "halflife": 0.5
         },
         "session_cache_liveness": {
             "value": 42186.620275326415,
             "halflife": 300
         },
         "cap_acquisition": {
             "value": 0,
             "halflife": 30
         },
         ...
     }
]

ceph daemonperf mds.cephfs.X
```
-------------------------------------------mds------ ------------------------------------- --mds_cache--- ------mds_log ------------ -mds_mem- -------mds_server------- mds_ -----objecter------ purg req rlat slr fwd inos caps exi imi hifc crev cgra ctru cfsa cfa hcc hccd hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr hcs hsr cre cat |sess|actv rd wr rdwr|purg| 114 0 0 0 1.9M 438k 0 0 0 0 253 0 0 0 0 0 0 59 | 0 0 0 |128 123k 129 0 |1.3M 1.9M|114 0 0 0 0 | 3 | 0 0 440 0 | 0 101 0 0 0 1.9M 438k 0 0 0 0 0 0 0 0 0 0 0 53 | 0 0 0 |106 123k 129 0 |1.3M 1.9M|101 0 0 0 0 | 3 | 0 0 0 0 | 0
...
``` - from this output it is clear that the client does not send cap release at all (column hccr, decoding of the columns "ceph daemon mds.X perf schema")

Then I was able to find the right query to google similar problems:
https://www.spinics.net/lists/ceph-users/msg50573.html
https://ceph-users.ceph.narkive.com/mcyPtEyz/rsync-kernel-client-cepfs-mkstemp-no-space-left-on-device
https://www.spinics.net/lists/ceph-users/msg50158.html
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/

It turns out that the problem is in rsync, in the way it works?

The only "solution" is to do it on the client according to a schedule (or upon reaching a certain number of open caps) “echo 2 > /proc/sys/vm/drop_caches”. After this command, the cephfs client releases the cached caps. And if there were a lot of them, then MDS becomes slow again.

We also tried to mount cephfs with the option "caps_max=10000" so that the client would do a forced release when the specified value is reached, but this did not help.

We can limit mds_max_caps_per_client (not tested), but this also affects all clients at once.

The command "ceph daemon mds.cephfs.X cache drop" (with or without an additional parameter) does not help

Tested on Linux kernels (client side): 5.10 and 6.1

Did I understand everything correctly? is this the expected behavior when running rsync?


And one more problem (I don’t know if it’s related or not), when rsync finishes copying, all caps are freed except the last two (pinned i_caps / total inodes 2 / 2)

At this moment a warning appears (or remains after releasing a large number of caps): 1 clients failing to advance oldest client/flush tid
But then it doesn't disappear. I waited 12 hours.
Warning disappears only after executing the "sync" command on the client. and in the client metrics you can see "pinned i_caps / total inodes 1 / 1"

Note: running "echo 2 > /proc/sys/vm/drop_caches" does not help in this case.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux