Hi Anantha, On Tue, Aug 8, 2023 at 1:59 AM Adiga, Anantha <anantha.adiga@xxxxxxxxx> wrote: > > Hi Venky, > > > > Could this be the reason that the peer-bootstrap import is hanging? how do I upgrade cephfs-mirror to Quincy? I was on leave yesterday -- will have a look at the log and update. > > root@fl31ca104ja0201:/# cephfs-mirror --version > > ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable) > > root@fl31ca104ja0201:/# ceph version > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) > > root@fl31ca104ja0201:/# > > > > > > Thank you, > > Anantha > > From: Adiga, Anantha > Sent: Monday, August 7, 2023 11:21 AM > To: 'Venky Shankar' <vshankar@xxxxxxxxxx>; 'ceph-users@xxxxxxx' <ceph-users@xxxxxxx> > Subject: RE: Re: cephfs snapshot mirror peer_bootstrap import hung > > > > Hi Venky, > > > > I tried on another secondary Quincy cluster and it is the same problem. The peer_bootstrap mport command hangs. > > > > > > root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQUFXbWV6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4xNTUuMTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1NS4xOTozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjIwOjMzMDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ== > > …… > > ……. > > ..command does not complete..waits here > > ^C to exit. > > Thereafter some commands do not complete… > > root@fl31ca104ja0201:/# ceph -s > > cluster: > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > health: HEALTH_OK > > > > services: > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 2d) > > mgr: fl31ca104ja0201.kkoono(active, since 3d), standbys: fl31ca104ja0202, fl31ca104ja0203 > > mds: 1/1 daemons up, 2 standby > > osd: 44 osds: 44 up (since 2d), 44 in (since 5w) > > cephfs-mirror: 1 daemon active (1 hosts) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > volumes: 1/1 healthy > > pools: 25 pools, 769 pgs > > objects: 614.40k objects, 1.9 TiB > > usage: 2.9 TiB used, 292 TiB / 295 TiB avail > > pgs: 769 active+clean > > > > io: > > client: 32 KiB/s rd, 0 B/s wr, 33 op/s rd, 1 op/s wr > > > > root@fl31ca104ja0201:/# > > root@fl31ca104ja0201:/# ceph fs status cephfs > > This command also waits. …… > > > > I have attached the mgr log > > root@fl31ca104ja0201:/# ceph service status > > { > > "cephfs-mirror": { > > "5306346": { > > "status_stamp": "2023-08-07T17:35:56.884907+0000", > > "last_beacon": "2023-08-07T17:45:01.903540+0000", > > "status": { > > "status_json": "{\"1\":{\"name\":\"cephfs\",\"directory_count\":0,\"peers\":{}}}" > > } > > } > > > > Quincy secondary cluster > > > > root@a001s008-zz14l47008:/# ceph mgr module enable mirroring > > root@a001s008-zz14l47008:/# ceph fs authorize cephfs client.mirror_remote / rwps > > [client.mirror_remote] > > key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA== > > root@a001s008-zz14l47008:/# ceph auth get client.mirror_remote > > [client.mirror_remote] > > key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA== > > caps mds = "allow rwps fsname=cephfs" > > caps mon = "allow r fsname=cephfs" > > caps osd = "allow rw tag cephfs data=cephfs" > > root@a001s008-zz14l47008:/# > > root@a001s008-zz14l47008:/# ceph fs snapshot mirror peer_bootstrap create cephfs client.mirror_remote shgR-site > > {"token": "eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQUFXbWV6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4xNTUuMTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1NS4xOTozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjIwOjMzMDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ=="} > > root@a001s008-zz14l47008:/# > > > > Thank you, > > Anantha > > > > From: Adiga, Anantha > Sent: Friday, August 4, 2023 11:55 AM > To: Venky Shankar <vshankar@xxxxxxxxxx>; ceph-users@xxxxxxx > Subject: RE: Re: cephfs snapshot mirror peer_bootstrap import hung > > > > Hi Venky, > > > > Thank you so much for the guidance. Attached is the mgr log. > > > > Note: the 4th node in the primary cluster has smaller capacity drives, the other 3 nodes have the larger capacity drives. > > 32 ssd 6.98630 1.00000 7.0 TiB 44 GiB 44 GiB 183 KiB 148 MiB 6.9 TiB 0.62 0.64 40 up osd.32 > > -7 76.84927 - 77 TiB 652 GiB 648 GiB 20 MiB 3.0 GiB 76 TiB 0.83 0.86 - host fl31ca104ja0203 > > 1 ssd 6.98630 1.00000 7.0 TiB 73 GiB 73 GiB 8.0 MiB 333 MiB 6.9 TiB 1.02 1.06 54 up osd.1 > > 4 ssd 6.98630 1.00000 7.0 TiB 77 GiB 77 GiB 1.1 MiB 174 MiB 6.9 TiB 1.07 1.11 55 up osd.4 > > 7 ssd 6.98630 1.00000 7.0 TiB 47 GiB 47 GiB 140 KiB 288 MiB 6.9 TiB 0.66 0.68 51 up osd.7 > > 10 ssd 6.98630 1.00000 7.0 TiB 75 GiB 75 GiB 299 KiB 278 MiB 6.9 TiB 1.05 1.09 44 up osd.10 > > 13 ssd 6.98630 1.00000 7.0 TiB 94 GiB 94 GiB 1018 KiB 291 MiB 6.9 TiB 1.31 1.36 72 up osd.13 > > 16 ssd 6.98630 1.00000 7.0 TiB 31 GiB 31 GiB 163 KiB 267 MiB 7.0 TiB 0.43 0.45 49 up osd.16 > > 19 ssd 6.98630 1.00000 7.0 TiB 14 GiB 14 GiB 756 KiB 333 MiB 7.0 TiB 0.20 0.21 50 up osd.19 > > 22 ssd 6.98630 1.00000 7.0 TiB 105 GiB 104 GiB 1.3 MiB 313 MiB 6.9 TiB 1.46 1.51 48 up osd.22 > > 25 ssd 6.98630 1.00000 7.0 TiB 17 GiB 16 GiB 257 KiB 272 MiB 7.0 TiB 0.23 0.24 45 up osd.25 > > 28 ssd 6.98630 1.00000 7.0 TiB 72 GiB 72 GiB 6.1 MiB 180 MiB 6.9 TiB 1.01 1.05 43 up osd.28 > > 31 ssd 6.98630 1.00000 7.0 TiB 47 GiB 46 GiB 592 KiB 358 MiB 6.9 TiB 0.65 0.68 56 up osd.31 > > -9 64.04089 - 64 TiB 728 GiB 726 GiB 17 MiB 1.8 GiB 63 TiB 1.11 1.15 - host fl31ca104ja0302 > > 33 ssd 5.82190 1.00000 5.8 TiB 65 GiB 65 GiB 245 KiB 144 MiB 5.8 TiB 1.09 1.13 47 up osd.33 > > 34 ssd 5.82190 1.00000 5.8 TiB 14 GiB 14 GiB 815 KiB 83 MiB 5.8 TiB 0.24 0.25 55 up osd.34 > > 35 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 224 KiB 213 MiB 5.7 TiB 1.30 1.34 44 up osd.35 > > 36 ssd 5.82190 1.00000 5.8 TiB 117 GiB 117 GiB 8.5 MiB 284 MiB 5.7 TiB 1.96 2.03 52 up osd.36 > > 37 ssd 5.82190 1.00000 5.8 TiB 58 GiB 58 GiB 501 KiB 132 MiB 5.8 TiB 0.98 1.01 40 up osd.37 > > 38 ssd 5.82190 1.00000 5.8 TiB 123 GiB 123 GiB 691 KiB 266 MiB 5.7 TiB 2.07 2.14 73 up osd.38 > > 39 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 609 KiB 193 MiB 5.7 TiB 1.30 1.34 62 up osd.39 > > 40 ssd 5.82190 1.00000 5.8 TiB 77 GiB 77 GiB 262 KiB 148 MiB 5.7 TiB 1.29 1.34 55 up osd.40 > > 41 ssd 5.82190 1.00000 5.8 TiB 44 GiB 44 GiB 4.4 MiB 140 MiB 5.8 TiB 0.75 0.77 44 up osd.41 > > 42 ssd 5.82190 1.00000 5.8 TiB 45 GiB 45 GiB 886 KiB 135 MiB 5.8 TiB 0.75 0.78 47 up osd.42 > > 43 ssd 5.82190 1.00000 5.8 TiB 28 GiB 28 GiB 187 KiB 104 MiB 5.8 TiB 0.48 0.49 58 up osd.43 > > > > [Also: Yesterday I had two cfs-mirror running one on fl31ca104ja0201 and fl31ca104ja0302. The cfs-mirror on fl31ca104ja0201 was stopped. When the import token was run on fl31ca104ja0302, the cfs-mirror log was active. Just in case it is useful I have attached that log (cfsmirror-container.log) as well. ] > > > > How can I list the token on the target cluster after running the create peer_bootstrap command? > > > > Here is today’s status with your suggestion: > > There is only one cfs-mirror daemon running now. It is on fl31ca104ja0201 node. > > > > root@fl31ca104ja0201:/# ceph -s > > cluster: > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > health: HEALTH_OK > > > > services: > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 7m) > > mgr: fl31ca104ja0201.kkoono(active, since 13m), standbys: fl31ca104ja0202, fl31ca104ja0203 > > mds: 1/1 daemons up, 2 standby > > osd: 44 osds: 44 up (since 7m), 44 in (since 4w) > > cephfs-mirror: 1 daemon active (1 hosts) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > volumes: 1/1 healthy > > pools: 25 pools, 769 pgs > > objects: 614.40k objects, 1.9 TiB > > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > > pgs: 769 active+clean > > > > io: > > client: 32 MiB/s rd, 0 B/s wr, 57 op/s rd, 1 op/s wr > > > > root@fl31ca104ja0201:/# > > root@fl31ca104ja0201:/# > > root@fl31ca104ja0201:/# ceph tell mgr.fl31ca104ja0201.kkoono config set debug_mgr 20 > > { > > "success": "" > > } > > root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= > > ^CInterrupted > > > > Ctrl-C after 15 min. Once the command is run, the health status goes to WARN . > > > > root@fl31ca104ja0201:/# ceph -s > > cluster: > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > health: HEALTH_WARN > > 6 slow ops, oldest one blocked for 1095 sec, mon.fl31ca104ja0203 has slow ops > > > > services: > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 30m) > > mgr: fl31ca104ja0201.kkoono(active, since 35m), standbys: fl31ca104ja0202, fl31ca104ja0203 > > mds: 1/1 daemons up, 2 standby > > osd: 44 osds: 44 up (since 29m), 44 in (since 4w) > > cephfs-mirror: 1 daemon active (1 hosts) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > volumes: 1/1 healthy > > pools: 25 pools, 769 pgs > > objects: 614.40k objects, 1.9 TiB > > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > > pgs: 769 active+clean > > > > io: > > client: 67 KiB/s rd, 0 B/s wr, 68 op/s rd, 21 op/s wr > > > > > > -----Original Message----- > From: Venky Shankar <vshankar@xxxxxxxxxx> > Sent: Thursday, August 3, 2023 11:03 PM > To: Adiga, Anantha <anantha.adiga@xxxxxxxxx> > Cc: ceph-users@xxxxxxx > Subject: Re: cephfs snapshot mirror peer_bootstrap import hung > > > > Hi Anantha, > > > > On Fri, Aug 4, 2023 at 2:27 AM Adiga, Anantha <anantha.adiga@xxxxxxxxx> wrote: > > > > > > Hi > > > > > > Could you please provide guidance on how to diagnose this issue: > > > > > > In this case, there are two Ceph clusters: cluster A, 4 nodes and cluster B, 3 node, in different locations. Both are already running RGW multi-site, A is master. > > > > > > Cephfs snapshot mirroring is being configured on the clusters. Cluster A is the primary, cluster B is the peer. Cephfs snapshot mirroring is being configured. The bootstrap import step on the primary node hangs. > > > > > > On the target cluster : > > > --------------------------- > > > "version": "16.2.5", > > > "release": "pacific", > > > "release_type": "stable" > > > > > > root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap create > > > cephfs client.mirror_remote flex2-site > > > {"token": > > > "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJma > > > Wxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiw > > > gInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd > > > 1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTU > > > uNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6M > > > zMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="} > > > > Seems fine uptil here. > > > > > root@cr21meg16ba0101:/var/run/ceph# > > > > > > On the source cluster: > > > ---------------------------- > > > "version": "17.2.6", > > > "release": "quincy", > > > "release_type": "stable" > > > > > > root@fl31ca104ja0201:/# ceph -s > > > cluster: > > > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > > > health: HEALTH_OK > > > > > > services: > > > mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m) > > > mgr: fl31ca104ja0201.nwpqlh(active, since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202 > > > mds: 1/1 daemons up, 2 standby > > > osd: 44 osds: 44 up (since 111m), 44 in (since 4w) > > > cephfs-mirror: 1 daemon active (1 hosts) > > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > > > data: > > > volumes: 1/1 healthy > > > pools: 25 pools, 769 pgs > > > objects: 614.40k objects, 1.9 TiB > > > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > > > pgs: 769 active+clean > > > > > > root@fl31ca104ja0302:/# ceph mgr module enable mirroring module > > > 'mirroring' is already enabled root@fl31ca104ja0302:/# ceph fs > > > snapshot mirror peer_bootstrap import cephfs > > > eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaW > > > xlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwg > > > InNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1 > > > h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUu > > > NzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6Mz > > > MwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= > > > > Going by your description, I'm guessing this is the command that hangs? If that's the case, set `debug_mgr=20`, repeat the token import step and share the ceph-mgr log. Also note that you can check the mirror daemon status as detailed in > > > > https://docs.ceph.com/en/latest/dev/cephfs-mirroring/#mirror-daemon-status > > > > > > > > > > > root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon > > > /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok status { > > > "metadata": { > > > "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5", > > > "ceph_version": "ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", > > > "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt", > > > "hostname": "fl31ca104ja0302", > > > "pid": "7", > > > "root": "/" > > > }, > > > "dentry_count": 0, > > > "dentry_pinned_count": 0, > > > "id": 5194553, > > > "inst": { > > > "name": { > > > "type": "client", > > > "num": 5194553 > > > }, > > > "addr": { > > > "type": "v1", > > > "addr": "10.45.129.5:0", > > > "nonce": 2497002034 > > > } > > > }, > > > "addr": { > > > "type": "v1", > > > "addr": "10.45.129.5:0", > > > "nonce": 2497002034 > > > }, > > > "inst_str": "client.5194553 10.45.129.5:0/2497002034", > > > "addr_str": "10.45.129.5:0/2497002034", > > > "inode_count": 1, > > > "mds_epoch": 118, > > > "osd_epoch": 6266, > > > "osd_epoch_barrier": 0, > > > "blocklisted": false, > > > "fs_name": "cephfs" > > > } > > > > > > root@fl31ca104ja0302:/home/general# docker logs > > > ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja030 > > > 2-sypagt --tail 10 debug 2023-08-03T05:24:27.413+0000 7f8eb6fc0280 0 > > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > > > (stable), process cephfs-mirror, pid 7 debug > > > 2023-08-03T05:24:27.413+0000 7f8eb6fc0280 0 pidfile_write: ignore > > > empty --pid-file debug 2023-08-03T05:24:27.445+0000 7f8eb6fc0280 1 > > > mgrc service_daemon_register cephfs-mirror.5184622 metadata > > > {arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.6 > > > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > > > (stable),ceph_version_short=17.2.6,container_hostname=fl31ca104ja0302, > > > container_image=quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c > > > 64ca62a0b38ab362e614ad671efa4a0547e,cpu=Intel(R) Xeon(R) Gold 6252 CPU > > > @ 2.10GHz,distro=centos,distro_description=CentOS Stream > > > 8,distro_version=8,hostname=fl31ca104ja0302,id=fl31ca104ja0302.sypagt, > > > instance_id=5184622,kernel_description=#82-Ubuntu SMP Tue Jun 6 > > > 23:10:23 UTC > > > 2023,kernel_version=5.15.0-75-generic,mem_swap_kb=8388604,mem_total_kb > > > =527946928,os=Linux} debug 2023-08-03T05:27:10.419+0000 7f8ea1b2c700 > > > 0 client.5194553 ms_handle_reset on v2:10.45.128.141:3300/0 debug > > > 2023-08-03T05:50:10.917+0000 7f8ea1b2c700 0 client.5194553 > > > ms_handle_reset on v2:10.45.128.139:3300/0 > > > > > > Thank you, > > > Anantha > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > > > email to ceph-users-leave@xxxxxxx > > > > > > > > > -- > > Cheers, > > Venky > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx