On Tue, Aug 23, 2022 at 10:01 PM Kuhring, Mathias <mathias.kuhring@xxxxxxxxxxxxxx> wrote: > > Dear Ceph developers and users, > > We are using ceph version 17.2.1 > (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable). > We are using cephadm since version 15 octopus. > > We mirror several CephFS directories from our main cluster our to a > second mirror cluster. > In particular with bigger directories (over 900 TB and 186 M of files), > we noticed that mirroring is very slow. > On the mirror, most of the time we only observe a write speed of 0 to 10 > MB/s in the client IO. > The target peer directory often doesn't show increase in size during > syncronization > (when we check with: getfattr -n ceph.dir.rbytes). > > The status of the syncs is always fine, i.e. syncing and not failing: > > 0|0[root@osd-1 /var/run/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f]# ceph > --admin-daemon > ceph-client.cephfs-mirror.osd-1.ydsqsw.7.94552861013544.asok fs mirror > peer status cephfs@1 c66afb80-593f-4c42-a120-dd3b6fca26bc > { > "/irods/sodar": { > "state": "syncing", > "current_sycning_snap": { > "id": 7552, > "name": "scheduled-2022-08-22-13_00_00" > }, > "last_synced_snap": { > "id": 7548, > "name": "scheduled-2022-08-22-12_00_00", > "sync_duration": 37828.164744490001, > "sync_time_stamp": "13240678.542916s" > }, > "snaps_synced": 1, > "snaps_deleted": 11, > "snaps_renamed": 0 > } > } > > The cluster nodes (6 per cluster) are connected with Dual 40G NICs to > the switches. > Connection between switches are 2x 100G. > Simple write operations from other clients to the mirror cephfs result > in writes of e.g. 300 to 400 MB/s. > So network doesn't seem to be the issue, here. > > We started to dig into debug logs of the cephfs-mirror daemon / docker > container. > We set the debug level to 20. Otherwise there are no messages at all (so > no errors). > > We observed a lot of messages with "need_data_sync=0, need_attr_sync=1". > Leading us to the assumption, that instead of actual data a lot of > attributes are synced. > > We started looking at specific examples in the logsband tried to make > sence from the source code which steps are happening. > Most of the messages are coming from cephfs::mirror::PeerReplayer > https://github.com/ceph/ceph/blob/6fee777d603aebce492c57b41f3b5760d50ddb07/src/tools/cephfs_mirror/PeerReplayer.cc > > We figured, the do_synchronize function checks if data (need_data_sync) > or attributes (need_attr_sync) should be synchronized using > should_sync_entry. > And if necessary performs the sync using remote_file_op. > > should_sync_entry reports different ctimes for our examples, e.g.: > local cur statx: mode=33152, uid=996, gid=993, size=154701172, > ctime=2022-01-28T12:54:21.176004+0000, ... > local prev statx: mode=33152, uid=996, gid=993, size=154701172, > ctime=2022-08-22T11:03:18.578380+0000, ... > > Based on these different ctimes, should_sync_entry decides then that > attributes need to be synced: > *need_attr_sync = (cstx.stx_ctime != pstx.stx_ctime) > https://github.com/ceph/ceph/blob/6fee777d603aebce492c57b41f3b5760d50ddb07/src/tools/cephfs_mirror/PeerReplayer.cc#L911 > > We assume cur statx/cstx refers to the file in the snapshot currently > mirrored. > But what exactly is prev statx/pstx? Is it the peer path or the last > snapshot on the mirror peer? > > We can confirm that ctimes are different on the main cluster and the mirror. > On the main cluster, the ctimes are consistent in every snapshot (since > the files didn't change). > On the the mirror, the ctimes increase with every snapshot towards more > current dates. > > Given that the CephFS Mirror daemon writes the data to the mirror as a > CephFS client, > it seems to make sense that data on the mirror has different / more > recent ctimes (from writing). > Also, when the mirror daemon is syncing the attributes to the mirror, > wouldn't this trigger an new/current ctime as well? > So our assumption is, syncing an old ctime will actually result in a new > ctime. > And thus trigger the sync of attributes over and over (at least with > every snapshot synced). > > So is ctime the proper parameter to test if attributes need to be synced? > Or shouldn't it rather be excluded? > So is this check the right thing to do: *need_attr_sync = > (cstx.stx_ctime != pstx.stx_ctime) > > Is it reasonable to assume that these attribute syncs are responsible > for our slow mirroring? > Or is there anything else we should look out for? > > And are there actually commands or logs showing us the speed of the > mirroring? > We only now about sync_duration and sync_time_stamp (as in the status > above). > But then, how can we actually determine the size of a snapshot or the > difference between snapshots? > So one can make speed calculations for the latest sync. > > What is your general experience with mirroring performance? > In particular with bigger cephfs directories towards peta bytes. > > Mirroring (backing up) our data is a really crucial issue for us (and > certainly many others). > So we are lookin forward for you input. Thanks a lot in advance. I see you created https://tracker.ceph.com/issues/58058. I'll follow up with you on the tracker. Thanks! > > Best Wishes, > Mathias > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx