Hi Anantha, On Mon, Aug 7, 2023 at 11:52 PM Adiga, Anantha <anantha.adiga@xxxxxxxxx> wrote: > > Hi Venky, > > > > I tried on another secondary Quincy cluster and it is the same problem. The peer_bootstrap mport command hangs. A pacific cluster generated peer token should be importable in a quincy source cluster. Looking at the logs, I suspect that the perceived hang is the mirroring module blocked on connecting to the secondary cluster (to set mirror info xattr). Are you able to connect to the secondary cluster from the host running ceph-mgr on the primary cluster using its monitor address (and a key)? > > > > > > root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQUFXbWV6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4xNTUuMTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1NS4xOTozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjIwOjMzMDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ== > > …… > > ……. > > ..command does not complete..waits here > > ^C to exit. > > Thereafter some commands do not complete… > > root@fl31ca104ja0201:/# ceph -s > > cluster: > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > health: HEALTH_OK > > > > services: > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 2d) > > mgr: fl31ca104ja0201.kkoono(active, since 3d), standbys: fl31ca104ja0202, fl31ca104ja0203 > > mds: 1/1 daemons up, 2 standby > > osd: 44 osds: 44 up (since 2d), 44 in (since 5w) > > cephfs-mirror: 1 daemon active (1 hosts) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > volumes: 1/1 healthy > > pools: 25 pools, 769 pgs > > objects: 614.40k objects, 1.9 TiB > > usage: 2.9 TiB used, 292 TiB / 295 TiB avail > > pgs: 769 active+clean > > > > io: > > client: 32 KiB/s rd, 0 B/s wr, 33 op/s rd, 1 op/s wr > > > > root@fl31ca104ja0201:/# > > root@fl31ca104ja0201:/# ceph fs status cephfs > > This command also waits. …… > > > > I have attached the mgr log > > root@fl31ca104ja0201:/# ceph service status > > { > > "cephfs-mirror": { > > "5306346": { > > "status_stamp": "2023-08-07T17:35:56.884907+0000", > > "last_beacon": "2023-08-07T17:45:01.903540+0000", > > "status": { > > "status_json": "{\"1\":{\"name\":\"cephfs\",\"directory_count\":0,\"peers\":{}}}" > > } > > } > > > > Quincy secondary cluster > > > > root@a001s008-zz14l47008:/# ceph mgr module enable mirroring > > root@a001s008-zz14l47008:/# ceph fs authorize cephfs client.mirror_remote / rwps > > [client.mirror_remote] > > key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA== > > root@a001s008-zz14l47008:/# ceph auth get client.mirror_remote > > [client.mirror_remote] > > key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA== > > caps mds = "allow rwps fsname=cephfs" > > caps mon = "allow r fsname=cephfs" > > caps osd = "allow rw tag cephfs data=cephfs" > > root@a001s008-zz14l47008:/# > > root@a001s008-zz14l47008:/# ceph fs snapshot mirror peer_bootstrap create cephfs client.mirror_remote shgR-site > > {"token": "eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQUFXbWV6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4xNTUuMTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1NS4xOTozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjIwOjMzMDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ=="} > > root@a001s008-zz14l47008:/# > > > > Thank you, > > Anantha > > > > From: Adiga, Anantha > Sent: Friday, August 4, 2023 11:55 AM > To: Venky Shankar <vshankar@xxxxxxxxxx>; ceph-users@xxxxxxx > Subject: RE: Re: cephfs snapshot mirror peer_bootstrap import hung > > > > Hi Venky, > > > > Thank you so much for the guidance. Attached is the mgr log. > > > > Note: the 4th node in the primary cluster has smaller capacity drives, the other 3 nodes have the larger capacity drives. > > 32 ssd 6.98630 1.00000 7.0 TiB 44 GiB 44 GiB 183 KiB 148 MiB 6.9 TiB 0.62 0.64 40 up osd.32 > > -7 76.84927 - 77 TiB 652 GiB 648 GiB 20 MiB 3.0 GiB 76 TiB 0.83 0.86 - host fl31ca104ja0203 > > 1 ssd 6.98630 1.00000 7.0 TiB 73 GiB 73 GiB 8.0 MiB 333 MiB 6.9 TiB 1.02 1.06 54 up osd.1 > > 4 ssd 6.98630 1.00000 7.0 TiB 77 GiB 77 GiB 1.1 MiB 174 MiB 6.9 TiB 1.07 1.11 55 up osd.4 > > 7 ssd 6.98630 1.00000 7.0 TiB 47 GiB 47 GiB 140 KiB 288 MiB 6.9 TiB 0.66 0.68 51 up osd.7 > > 10 ssd 6.98630 1.00000 7.0 TiB 75 GiB 75 GiB 299 KiB 278 MiB 6.9 TiB 1.05 1.09 44 up osd.10 > > 13 ssd 6.98630 1.00000 7.0 TiB 94 GiB 94 GiB 1018 KiB 291 MiB 6.9 TiB 1.31 1.36 72 up osd.13 > > 16 ssd 6.98630 1.00000 7.0 TiB 31 GiB 31 GiB 163 KiB 267 MiB 7.0 TiB 0.43 0.45 49 up osd.16 > > 19 ssd 6.98630 1.00000 7.0 TiB 14 GiB 14 GiB 756 KiB 333 MiB 7.0 TiB 0.20 0.21 50 up osd.19 > > 22 ssd 6.98630 1.00000 7.0 TiB 105 GiB 104 GiB 1.3 MiB 313 MiB 6.9 TiB 1.46 1.51 48 up osd.22 > > 25 ssd 6.98630 1.00000 7.0 TiB 17 GiB 16 GiB 257 KiB 272 MiB 7.0 TiB 0.23 0.24 45 up osd.25 > > 28 ssd 6.98630 1.00000 7.0 TiB 72 GiB 72 GiB 6.1 MiB 180 MiB 6.9 TiB 1.01 1.05 43 up osd.28 > > 31 ssd 6.98630 1.00000 7.0 TiB 47 GiB 46 GiB 592 KiB 358 MiB 6.9 TiB 0.65 0.68 56 up osd.31 > > -9 64.04089 - 64 TiB 728 GiB 726 GiB 17 MiB 1.8 GiB 63 TiB 1.11 1.15 - host fl31ca104ja0302 > > 33 ssd 5.82190 1.00000 5.8 TiB 65 GiB 65 GiB 245 KiB 144 MiB 5.8 TiB 1.09 1.13 47 up osd.33 > > 34 ssd 5.82190 1.00000 5.8 TiB 14 GiB 14 GiB 815 KiB 83 MiB 5.8 TiB 0.24 0.25 55 up osd.34 > > 35 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 224 KiB 213 MiB 5.7 TiB 1.30 1.34 44 up osd.35 > > 36 ssd 5.82190 1.00000 5.8 TiB 117 GiB 117 GiB 8.5 MiB 284 MiB 5.7 TiB 1.96 2.03 52 up osd.36 > > 37 ssd 5.82190 1.00000 5.8 TiB 58 GiB 58 GiB 501 KiB 132 MiB 5.8 TiB 0.98 1.01 40 up osd.37 > > 38 ssd 5.82190 1.00000 5.8 TiB 123 GiB 123 GiB 691 KiB 266 MiB 5.7 TiB 2.07 2.14 73 up osd.38 > > 39 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 609 KiB 193 MiB 5.7 TiB 1.30 1.34 62 up osd.39 > > 40 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 262 KiB 148 MiB 5.7 TiB 1.29 1.34 55 up osd.40 > > 41 ssd 5.82190 1.00000 5.8 TiB 44 GiB 44 GiB 4.4 MiB 140 MiB 5.8 TiB 0.75 0.77 44 up osd.41 > > 42 ssd 5.82190 1.00000 5.8 TiB 45 GiB 45 GiB 886 KiB 135 MiB 5.8 TiB 0.75 0.78 47 up osd.42 > > 43 ssd 5.82190 1.00000 5.8 TiB 28 GiB 28 GiB 187 KiB 104 MiB 5.8 TiB 0.48 0.49 58 up osd.43 > > > > [Also: Yesterday I had two cfs-mirror running one on fl31ca104ja0201 and fl31ca104ja0302. The cfs-mirror on fl31ca104ja0201 was stopped. When the import token was run on fl31ca104ja0302, the cfs-mirror log was active. Just in case it is useful I have attached that log (cfsmirror-container.log) as well. ] > > > > How can I list the token on the target cluster after running the create peer_bootstrap command? > > > > Here is today’s status with your suggestion: > > There is only one cfs-mirror daemon running now. It is on fl31ca104ja0201 node. > > > > root@fl31ca104ja0201:/# ceph -s > > cluster: > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > health: HEALTH_OK > > > > services: > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 7m) > > mgr: fl31ca104ja0201.kkoono(active, since 13m), standbys: fl31ca104ja0202, fl31ca104ja0203 > > mds: 1/1 daemons up, 2 standby > > osd: 44 osds: 44 up (since 7m), 44 in (since 4w) > > cephfs-mirror: 1 daemon active (1 hosts) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > volumes: 1/1 healthy > > pools: 25 pools, 769 pgs > > objects: 614.40k objects, 1.9 TiB > > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > > pgs: 769 active+clean > > > > io: > > client: 32 MiB/s rd, 0 B/s wr, 57 op/s rd, 1 op/s wr > > > > root@fl31ca104ja0201:/# > > root@fl31ca104ja0201:/# > > root@fl31ca104ja0201:/# ceph tell mgr.fl31ca104ja0201.kkoono config set debug_mgr 20 > > { > > "success": "" > > } > > root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= > > ^CInterrupted > > > > Ctrl-C after 15 min. Once the command is run, the health status goes to WARN . > > > > root@fl31ca104ja0201:/# ceph -s > > cluster: > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > health: HEALTH_WARN > > 6 slow ops, oldest one blocked for 1095 sec, mon.fl31ca104ja0203 has slow ops > > > > services: > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 30m) > > mgr: fl31ca104ja0201.kkoono(active, since 35m), standbys: fl31ca104ja0202, fl31ca104ja0203 > > mds: 1/1 daemons up, 2 standby > > osd: 44 osds: 44 up (since 29m), 44 in (since 4w) > > cephfs-mirror: 1 daemon active (1 hosts) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > volumes: 1/1 healthy > > pools: 25 pools, 769 pgs > > objects: 614.40k objects, 1.9 TiB > > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > > pgs: 769 active+clean > > > > io: > > client: 67 KiB/s rd, 0 B/s wr, 68 op/s rd, 21 op/s wr > > > > > > -----Original Message----- > From: Venky Shankar <vshankar@xxxxxxxxxx> > Sent: Thursday, August 3, 2023 11:03 PM > To: Adiga, Anantha <anantha.adiga@xxxxxxxxx> > Cc: ceph-users@xxxxxxx > Subject: Re: cephfs snapshot mirror peer_bootstrap import hung > > > > Hi Anantha, > > > > On Fri, Aug 4, 2023 at 2:27 AM Adiga, Anantha <anantha.adiga@xxxxxxxxx> wrote: > > > > > > Hi > > > > > > Could you please provide guidance on how to diagnose this issue: > > > > > > In this case, there are two Ceph clusters: cluster A, 4 nodes and cluster B, 3 node, in different locations. Both are already running RGW multi-site, A is master. > > > > > > Cephfs snapshot mirroring is being configured on the clusters. Cluster A is the primary, cluster B is the peer. Cephfs snapshot mirroring is being configured. The bootstrap import step on the primary node hangs. > > > > > > On the target cluster : > > > --------------------------- > > > "version": "16.2.5", > > > "release": "pacific", > > > "release_type": "stable" > > > > > > root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap create > > > cephfs client.mirror_remote flex2-site > > > {"token": > > > "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJma > > > Wxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiw > > > gInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd > > > 1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTU > > > uNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6M > > > zMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="} > > > > Seems fine uptil here. > > > > > root@cr21meg16ba0101:/var/run/ceph# > > > > > > On the source cluster: > > > ---------------------------- > > > "version": "17.2.6", > > > "release": "quincy", > > > "release_type": "stable" > > > > > > root@fl31ca104ja0201:/# ceph -s > > > cluster: > > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > > health: HEALTH_OK > > > > > > services: > > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m) > > > mgr: fl31ca104ja0201.nwpqlh(active, since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202 > > > mds: 1/1 daemons up, 2 standby > > > osd: 44 osds: 44 up (since 111m), 44 in (since 4w) > > > cephfs-mirror: 1 daemon active (1 hosts) > > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > > > data: > > > volumes: 1/1 healthy > > > pools: 25 pools, 769 pgs > > > objects: 614.40k objects, 1.9 TiB > > > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > > > pgs: 769 active+clean > > > > > > root@fl31ca104ja0302:/# ceph mgr module enable mirroring module > > > 'mirroring' is already enabled root@fl31ca104ja0302:/# ceph fs > > > snapshot mirror peer_bootstrap import cephfs > > > eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaW > > > xlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwg > > > InNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1 > > > h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUu > > > NzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6Mz > > > MwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= > > > > Going by your description, I'm guessing this is the command that hangs? If that's the case, set `debug_mgr=20`, repeat the token import step and share the ceph-mgr log. Also note that you can check the mirror daemon status as detailed in > > > > https://docs.ceph.com/en/latest/dev/cephfs-mirroring/#mirror-daemon-status > > > > > > > > > > > root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon > > > /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok status { > > > "metadata": { > > > "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5", > > > "ceph_version": "ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", > > > "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt", > > > "hostname": "fl31ca104ja0302", > > > "pid": "7", > > > "root": "/" > > > }, > > > "dentry_count": 0, > > > "dentry_pinned_count": 0, > > > "id": 5194553, > > > "inst": { > > > "name": { > > > "type": "client", > > > "num": 5194553 > > > }, > > > "addr": { > > > "type": "v1", > > > "addr": "10.45.129.5:0", > > > "nonce": 2497002034 > > > } > > > }, > > > "addr": { > > > "type": "v1", > > > "addr": "10.45.129.5:0", > > > "nonce": 2497002034 > > > }, > > > "inst_str": "client.5194553 10.45.129.5:0/2497002034", > > > "addr_str": "10.45.129.5:0/2497002034", > > > "inode_count": 1, > > > "mds_epoch": 118, > > > "osd_epoch": 6266, > > > "osd_epoch_barrier": 0, > > > "blocklisted": false, > > > "fs_name": "cephfs" > > > } > > > > > > root@fl31ca104ja0302:/home/general# docker logs > > > ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja030 > > > 2-sypagt --tail 10 debug 2023-08-03T05:24:27.413+0000 7f8eb6fc0280 0 > > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > > > (stable), process cephfs-mirror, pid 7 debug > > > 2023-08-03T05:24:27.413+0000 7f8eb6fc0280 0 pidfile_write: ignore > > > empty --pid-file debug 2023-08-03T05:24:27.445+0000 7f8eb6fc0280 1 > > > mgrc service_daemon_register cephfs-mirror.5184622 metadata > > > {arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.6 > > > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > > > (stable),ceph_version_short=17.2.6,container_hostname=fl31ca104ja0302, > > > container_image=quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c > > > 64ca62a0b38ab362e614ad671efa4a0547e,cpu=Intel(R) Xeon(R) Gold 6252 CPU > > > @ 2.10GHz,distro=centos,distro_description=CentOS Stream > > > 8,distro_version=8,hostname=fl31ca104ja0302,id=fl31ca104ja0302.sypagt, > > > instance_id=5184622,kernel_description=#82-Ubuntu SMP Tue Jun 6 > > > 23:10:23 UTC > > > 2023,kernel_version=5.15.0-75-generic,mem_swap_kb=8388604,mem_total_kb > > > =527946928,os=Linux} debug 2023-08-03T05:27:10.419+0000 7f8ea1b2c700 > > > 0 client.5194553 ms_handle_reset on v2:10.45.128.141:3300/0 debug > > > 2023-08-03T05:50:10.917+0000 7f8ea1b2c700 0 client.5194553 > > > ms_handle_reset on v2:10.45.128.139:3300/0 > > > > > > Thank you, > > > Anantha > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > > > email to ceph-users-leave@xxxxxxx > > > > > > > > > -- > > Cheers, > > Venky > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx