Re: CephFS Snapshot Mirroring slow due to repeating attribute sync

Venky Shankar <vshankar@xxxxxxxxxx> · Mon, 28 Nov 2022 20:20:40 +0530

Hi Mathias,

(apologies for the super late reply - I was getting back from a long
vacation and missed seeing this).

I updated the tracker ticket. Let's move the discussion there...

On Mon, Nov 28, 2022 at 7:46 PM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
>
> On Tue, Aug 23, 2022 at 10:01 PM Kuhring, Mathias
> <mathias.kuhring@xxxxxxxxxxxxxx> wrote:
> >
> > Dear Ceph developers and users,
> >
> > We are using ceph version 17.2.1
> > (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable).
> > We are using cephadm since version 15 octopus.
> >
> > We mirror several CephFS directories from our main cluster our to a
> > second mirror cluster.
> > In particular with bigger directories (over 900 TB and 186 M of files),
> > we noticed that mirroring is very slow.
> > On the mirror, most of the time we only observe a write speed of 0 to 10
> > MB/s in the client IO.
> > The target peer directory often doesn't show increase in size during
> > syncronization
> > (when we check with: getfattr -n ceph.dir.rbytes).
> >
> > The status of the syncs is always fine, i.e. syncing and not failing:
> >
> > 0|0[root@osd-1 /var/run/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f]# ceph
> > --admin-daemon
> > ceph-client.cephfs-mirror.osd-1.ydsqsw.7.94552861013544.asok fs mirror
> > peer status cephfs@1 c66afb80-593f-4c42-a120-dd3b6fca26bc
> > {
> >      "/irods/sodar": {
> >          "state": "syncing",
> >          "current_sycning_snap": {
> >              "id": 7552,
> >              "name": "scheduled-2022-08-22-13_00_00"
> >          },
> >          "last_synced_snap": {
> >              "id": 7548,
> >              "name": "scheduled-2022-08-22-12_00_00",
> >              "sync_duration": 37828.164744490001,
> >              "sync_time_stamp": "13240678.542916s"
> >          },
> >          "snaps_synced": 1,
> >          "snaps_deleted": 11,
> >          "snaps_renamed": 0
> >      }
> > }
> >
> > The cluster nodes (6 per cluster) are connected with Dual 40G NICs to
> > the switches.
> > Connection between switches are 2x 100G.
> > Simple write operations from other clients to the mirror cephfs result
> > in writes of e.g. 300 to 400 MB/s.
> > So network doesn't seem to be the issue, here.
> >
> > We started to dig into debug logs of the cephfs-mirror daemon / docker
> > container.
> > We set the debug level to 20. Otherwise there are no messages at all (so
> > no errors).
> >
> > We observed a lot of messages with "need_data_sync=0, need_attr_sync=1".
> > Leading us to the assumption, that instead of actual data a lot of
> > attributes are synced.
> >
> > We started looking at specific examples in the logsband tried to make
> > sence from the source code which steps are happening.
> > Most of the messages are coming from cephfs::mirror::PeerReplayer
> > https://github.com/ceph/ceph/blob/6fee777d603aebce492c57b41f3b5760d50ddb07/src/tools/cephfs_mirror/PeerReplayer.cc
> >
> > We figured, the do_synchronize function checks if data (need_data_sync)
> > or attributes (need_attr_sync) should be synchronized using
> > should_sync_entry.
> > And if necessary performs the sync using remote_file_op.
> >
> > should_sync_entry reports different ctimes for our examples, e.g.:
> > local cur statx: mode=33152, uid=996, gid=993, size=154701172,
> > ctime=2022-01-28T12:54:21.176004+0000, ...
> > local prev statx: mode=33152, uid=996, gid=993, size=154701172,
> > ctime=2022-08-22T11:03:18.578380+0000, ...
> >
> > Based on these different ctimes, should_sync_entry decides then that
> > attributes need to be synced:
> > *need_attr_sync = (cstx.stx_ctime != pstx.stx_ctime)
> > https://github.com/ceph/ceph/blob/6fee777d603aebce492c57b41f3b5760d50ddb07/src/tools/cephfs_mirror/PeerReplayer.cc#L911
> >
> > We assume cur statx/cstx refers to the file in the snapshot currently
> > mirrored.
> > But what exactly is prev statx/pstx? Is it the peer path or the last
> > snapshot on the mirror peer?
> >
> > We can confirm that ctimes are different on the main cluster and the mirror.
> > On the main cluster, the ctimes are consistent in every snapshot (since
> > the files didn't change).
> > On the the mirror, the ctimes increase with every snapshot towards more
> > current dates.
> >
> > Given that the CephFS Mirror daemon writes the data to the mirror as a
> > CephFS client,
> > it seems to make sense that data on the mirror has different / more
> > recent ctimes (from writing).
> > Also, when the mirror daemon is syncing the attributes to the mirror,
> > wouldn't this trigger an new/current ctime as well?
> > So our assumption is, syncing an old ctime will actually result in a new
> > ctime.
> > And thus trigger the sync of attributes over and over (at least with
> > every snapshot synced).
> >
> > So is ctime the proper parameter to test if attributes need to be synced?
> > Or shouldn't it rather be excluded?
> > So is this check the right thing to do: *need_attr_sync =
> > (cstx.stx_ctime != pstx.stx_ctime)
> >
> > Is it reasonable to assume that these attribute syncs are responsible
> > for our slow mirroring?
> > Or is there anything else we should look out for?
> >
> > And are there actually commands or logs showing us the speed of the
> > mirroring?
> > We only now about sync_duration and sync_time_stamp (as in the status
> > above).
> > But then, how can we actually determine the size of a snapshot or the
> > difference between snapshots?
> > So one can make speed calculations for the latest sync.
> >
> > What is your general experience with mirroring performance?
> > In particular with bigger cephfs directories towards peta bytes.
> >
> > Mirroring (backing up) our data is a really crucial issue for us (and
> > certainly many others).
> > So we are lookin forward for you input. Thanks a lot in advance.
>
> I see you created https://tracker.ceph.com/issues/58058. I'll follow
> up with you on the tracker. Thanks!
>
> >
> > Best Wishes,
> > Mathias
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> --
> Cheers,
> Venky

-- 
Cheers,
Venky

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx