Hello Nikita, A valid workaround is to export both instances of CephFS via NFS-Ganesha and run rsync on NFS, not on CephFS directly. On Tue, Mar 26, 2024 at 10:15 PM Nikita Borisenkov <n.borisenkov@xxxxxxxxxxxxxx> wrote: > > We transfer data (300 million small files) using rsync between cephfs > from version 12.2.13 to 18.2.1. After about the same time (about 7 hours > in this case), copying stops for a minute > > ``` > health:HEALTH_WARN > 1 clients failing to advance oldest client/flush tid > 1 MDSs report slow metadata IOs > 1 MDSs behind on trimming > ``` > > The only interesting messages in the logs are: > ``` > ceph-mds[640738]: mds.beacon.cephfs.X missed beacon ack from the monitors > ``` > > I watched debugging on the client (destination) via watch -n1 > /sys/kernel/debug/ceph/451eea44-d7a0-11ee-9117-b496914b4c02.client32497 > ``` > item total > ------------------------------------------ > opened files / total inodes 1 / 866757 > pinned i_caps / total inodes 866757 / 866757 > opened inodes / total inodes 1 / 866757 > > item total avg_lat(us) min_lat(us) max_lat(us) stdev(us) > -------------------------------------------------- > --------------------------------- > read 0 0 0 0 0 > write 5129252 194689 9080 54693415 1023 > metadata 29045143 670 161 87794124 369 > > item total miss hit > ------------------------------------------------- > d_lease 5361 3026 315488442 > caps 866757 11 483177478 > ``` > During the copy, "pinned i_caps / total inodes" are gradually increased > until it reaches the value "mds_max_caps_per_client" (default: 1Mi). > Then "pinned i_caps / total inodes" begins to decrease to almost 0, at > which time HEALTH_WARN appears and transfer stops. "op/s wr" increases > from 200 to 1.5k. Then total inodes begin to increase again along with > the resumption of copying and the cluster goes into the HEALTHY state. > > Mount options: > /mnt/cephfs-old 10.77.12.90:6789,10.77.12.91:6789,10.77.12.92:6789:/ > ceph rw,noatime,nodiratime,name=admin,secret=<hidden>,acl > /mnt/cephfs-new 10.77.12.139:6789,10.77.12.140:6789,10.77.12.141:6789:/ > ceph rw,noatime,nodiratime,name=admin,secret=<hidden>,acl,caps_max=10000 > > Client properties on the MDS server (removed unnecessary): > ceph daemon mds.cephfs.X client ls 32497 > [ > { > ... > "id": 32497, > "state": "open", > "num_leases": 0, > "num_caps": 980679, > "request_load_avg": 7913, > "requests_in_flight": 466, > "num_completed_flushes": 464, > "recall_caps": { > "value": 0, > "halflife": 60 > }, > "release_caps": { > "value": 1732.2552002208533, > "halflife": 60 > }, > "recall_caps_throttle": { > "value": 0, > "halflife": 1.35000000000000001 > }, > "recall_caps_throttle2o": { > "value": 0, > "halflife": 0.5 > }, > "session_cache_liveness": { > "value": 42186.620275326415, > "halflife": 300 > }, > "cap_acquisition": { > "value": 0, > "halflife": 30 > }, > ... > } > ] > > ceph daemonperf mds.cephfs.X > ``` > -------------------------------------------mds------ > ------------------------------------- --mds_cache--- ------mds_log > ------------ -mds_mem- -------mds_server------- mds_ -----objecter------ > purg > req rlat slr fwd inos caps exi imi hifc crev cgra ctru cfsa cfa hcc hccd > hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr hcs hsr cre cat > |sess|actv rd wr rdwr|purg| > 114 0 0 0 1.9M 438k 0 0 0 0 253 0 0 0 0 0 0 59 | 0 0 0 |128 123k 129 0 > |1.3M 1.9M|114 0 0 0 0 | 3 | 0 0 440 0 | 0 > 101 0 0 0 1.9M 438k 0 0 0 0 0 0 0 0 0 0 0 53 | 0 0 0 |106 123k 129 0 > |1.3M 1.9M|101 0 0 0 0 | 3 | 0 0 0 0 | 0 > ... > ``` - from this output it is clear that the client does not send cap > release at all (column hccr, decoding of the columns "ceph daemon mds.X > perf schema") > > Then I was able to find the right query to google similar problems: > https://www.spinics.net/lists/ceph-users/msg50573.html > https://ceph-users.ceph.narkive.com/mcyPtEyz/rsync-kernel-client-cepfs-mkstemp-no-space-left-on-device > https://www.spinics.net/lists/ceph-users/msg50158.html > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/ > > It turns out that the problem is in rsync, in the way it works? > > The only "solution" is to do it on the client according to a schedule > (or upon reaching a certain number of open caps) “echo 2 > > /proc/sys/vm/drop_caches”. After this command, the cephfs client > releases the cached caps. And if there were a lot of them, then MDS > becomes slow again. > > We also tried to mount cephfs with the option "caps_max=10000" so that > the client would do a forced release when the specified value is > reached, but this did not help. > > We can limit mds_max_caps_per_client (not tested), but this also affects > all clients at once. > > The command "ceph daemon mds.cephfs.X cache drop" (with or without an > additional parameter) does not help > > Tested on Linux kernels (client side): 5.10 and 6.1 > > Did I understand everything correctly? is this the expected behavior > when running rsync? > > > And one more problem (I don’t know if it’s related or not), when rsync > finishes copying, all caps are freed except the last two (pinned i_caps > / total inodes 2 / 2) > > At this moment a warning appears (or remains after releasing a large > number of caps): 1 clients failing to advance oldest client/flush tid > But then it doesn't disappear. I waited 12 hours. > Warning disappears only after executing the "sync" command on the > client. and in the client metrics you can see "pinned i_caps / total > inodes 1 / 1" > > Note: running "echo 2 > /proc/sys/vm/drop_caches" does not help in this > case. > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Alexander E. Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx