Re: cephfs snapshot mirror peer_bootstrap import hung

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Venky,

Here, should I send the mrg log ?

root@fl31ca104ja0201:/etc/ceph# ceph -c remote_ceph.conf --id=mirror_remote  status --verbose
parsed_args: Namespace(admin_socket=None, block=False, cephconf='remote_ceph.conf', client_id='mirror_remote', client_name=None, cluster=None, cluster_timeout=None, completion=False, help=False, input_file=None, output_file=None, output_format=None, period=1, setgroup=None, setuser=None, status=False, verbose=True, version=False, watch=False, watch_channel=None, watch_debug=False, watch_error=False, watch_info=False, watch_sec=False, watch_warn=False), childargs: ['status']
^CCluster connection aborted

root@fl31ca104ja0201:/etc/ceph#  cat remote_ceph.client.mirror_remote.keyring
[client.mirror_remote]
        key = AQCfwMlkM90pLBAAwXtvpp8j04IvC8tqpAG9bA==
        caps mds = "allow rwps fsname=cephfs"
        caps mon = "allow r fsname=cephfs"
        caps osd = "allow rw tag cephfs data=cephfs"

root@fl31ca104ja0201:/etc/ceph# cat remote_ceph.conf
[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor

[client.rgw.cr21meg16ba0101.rgw0]
host = cr21meg16ba0101
keyring = /var/lib/ceph/radosgw/ceph-rgw.cr21meg16ba0101.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-cr21meg16ba0101.rgw0.log
rgw frontends = beast endpoint=172.18.55.71:8080
rgw thread pool size = 512

# Please do not change this file directly since it is managed by Ansible and will be overwritten
[global]
cluster network = 172.18.55.71/24
fsid = a6f52598-e5cd-4a08-8422-7b6fdb1d5dbe
mon host = [v2:172.18.55.71:3300,v1:172.18.55.71:6789],[v2:172.18.55.72:3300,v1:172.18.55.72:6789],[v2:172.18.55.73:3300,v1:172.18.55.73:6789]
mon initial members = cr21meg16ba0101,cr21meg16ba0102,cr21meg16ba0103
osd pool default crush rule = -1
public network = 172.18.55.0/24

[mon]
auth_allow_insecure_global_id_reclaim = False
auth_expose_insecure_global_id_reclaim = False

[osd]
osd memory target = 23630132019

-----Original Message-----
From: Venky Shankar <vshankar@xxxxxxxxxx> 
Sent: Monday, August 7, 2023 9:26 PM
To: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
Cc: ceph-users@xxxxxxx
Subject: Re:  Re: cephfs snapshot mirror peer_bootstrap import hung

On Tue, Aug 8, 2023 at 9:16 AM Adiga, Anantha <anantha.adiga@xxxxxxxxx> wrote:
>
> Hi Venky,
>
> Is this correct?
> (copied ceph.conf from secondary cluster to /etc/ce/ph/crsite directory in primary cluster, copied ceph.mon.keyring from secondary as  ceph.client.crsite.mon.keyring    in /etc/ceph on primary)
> root@fl31ca104ja0201:/etc/ceph# ls
> ceph.client.admin.keyring  ceph.client.crsite.admin.keyring  ceph.client.mirror_remote.keying  crsite        fio-fs.test   fs-mnt   rbdmap
> ceph.client.crash.keyring  ceph.client.crsite.mon.keyring    ceph.conf                         fio-bsd.test  fio-nfs.test  nfs-mnt  remote_ceph.conf
> root@fl31ca104ja0201:/etc/ceph# ls crsite ceph.conf  ceph.mon.keyring
>
> root@fl31ca104ja0201:/etc/ceph/crsite# ceph -c ceph.conf 
> --id=crsite.mon --cluster=ceph --verbose
> parsed_args: Namespace(admin_socket=None, block=False, 
> cephconf='ceph.conf', client_id='crsite.mon', client_name=None, 
> cluster='ceph', cluster_timeout=None, completion=False, help=False, 
> input_file=None, output_file=None, output_format=None, period=1, 
> setgroup=None, setuser=None, status=False, verbose=True, 
> version=False, watch=False, watch_channel=None, watch_debug=False, 
> watch_error=False, watch_info=False, watch_sec=False, 
> watch_warn=False), childargs: [] ^CCluster connection aborted
>
> Not sure if the --id (CLIENT_ID) is correct.. not able to connect

use `remote_ceph.conf` and id as `mirror_remote` (since I guess these are the secondary clusters' conf given the names).

>
> Thank you,
> Anantha
>
> -----Original Message-----
> From: Venky Shankar <vshankar@xxxxxxxxxx>
> Sent: Monday, August 7, 2023 7:05 PM
> To: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
> Cc: ceph-users@xxxxxxx
> Subject: Re:  Re: cephfs snapshot mirror peer_bootstrap 
> import hung
>
> Hi Anantha,
>
> On Tue, Aug 8, 2023 at 6:29 AM Adiga, Anantha <anantha.adiga@xxxxxxxxx> wrote:
> >
> > Hi Venky,
> >
> > The primary and secondary clusters both have the same cluster name "ceph" and both have a single filesystem by name "cephfs".
>
> That's not an issue.
>
> > How do I check the connection from primary to secondary using mon addr and key?   What is command line
>
> A quick way to check this would be to place the secondary cluster ceph 
> config file and the user key on one of the primary node (preferably, 
> the ceph-mgr host, just for tests - so purge these when done) and then 
> running
>
>         ceph -c /path/to/secondary/ceph.conf --id <> status
>
> If that runs all fine, then the mirror daemon is probably hitting some bug.
>
> > These two clusters are configured for rgw multisite and is functional.
> >
> > Thank you,
> > Anantha
> >
> > -----Original Message-----
> > From: Venky Shankar <vshankar@xxxxxxxxxx>
> > Sent: Monday, August 7, 2023 5:46 PM
> > To: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
> > Cc: ceph-users@xxxxxxx
> > Subject: Re:  Re: cephfs snapshot mirror peer_bootstrap 
> > import hung
> >
> > Hi Anantha,
> >
> > On Mon, Aug 7, 2023 at 11:52 PM Adiga, Anantha <anantha.adiga@xxxxxxxxx> wrote:
> > >
> > > Hi Venky,
> > >
> > >
> > >
> > > I tried on another secondary Quincy cluster and it is the same problem. The peer_bootstrap mport  command hangs.
> >
> > A pacific cluster generated peer token should be importable in a quincy source cluster. Looking at the logs, I suspect that the perceived hang is the mirroring module blocked on connecting to the secondary cluster (to set mirror info xattr). Are you able to connect to the secondary cluster from the host running ceph-mgr on the primary cluster using its monitor address (and a key)?
> >
> > The primary and secondary clusters both have the same cluster name "ceph" and both have a single filesystem by name "cephfs".  How do I check that connection from primary to secondary using mon addr and key?
> > These two clusters are configured for rgw multisite and is functional.
> >
> > >
> > >
> > >
> > >
> > >
> > > root@fl31ca104ja0201:/# ceph fs  snapshot mirror peer_bootstrap 
> > > import cephfs 
> > > eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsIC
> > > Jm
> > > aW
> > > xlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3Rl
> > > Ii
> > > wg
> > > InNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQU
> > > FX
> > > bW
> > > V6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4x
> > > NT
> > > Uu
> > > MTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1NS
> > > 4x
> > > OT
> > > ozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjIw
> > > Oj Mz MDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ==
> > >
> > > ……
> > >
> > > …….
> > >
> > > ..command does not complete..waits here
> > >
> > > ^C  to exit.
> > >
> > > Thereafter some commands do not complete…
> > >
> > > root@fl31ca104ja0201:/# ceph -s
> > >
> > >   cluster:
> > >
> > >     id:     d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
> > >
> > >     health: HEALTH_OK
> > >
> > >
> > >
> > >   services:
> > >
> > >     mon:           3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 2d)
> > >
> > >     mgr:           fl31ca104ja0201.kkoono(active, since 3d), standbys: fl31ca104ja0202, fl31ca104ja0203
> > >
> > >     mds:           1/1 daemons up, 2 standby
> > >
> > >     osd:           44 osds: 44 up (since 2d), 44 in (since 5w)
> > >
> > >     cephfs-mirror: 1 daemon active (1 hosts)
> > >
> > >     rgw:           3 daemons active (3 hosts, 1 zones)
> > >
> > >
> > >
> > >   data:
> > >
> > >     volumes: 1/1 healthy
> > >
> > >     pools:   25 pools, 769 pgs
> > >
> > >     objects: 614.40k objects, 1.9 TiB
> > >
> > >     usage:   2.9 TiB used, 292 TiB / 295 TiB avail
> > >
> > >     pgs:     769 active+clean
> > >
> > >
> > >
> > >   io:
> > >
> > >     client:   32 KiB/s rd, 0 B/s wr, 33 op/s rd, 1 op/s wr
> > >
> > >
> > >
> > > root@fl31ca104ja0201:/#
> > >
> > > root@fl31ca104ja0201:/# ceph fs status cephfs
> > >
> > > This command also waits. ……
> > >
> > >
> > >
> > > I have attached the mgr log
> > >
> > > root@fl31ca104ja0201:/# ceph service status
> > >
> > > {
> > >
> > >     "cephfs-mirror": {
> > >
> > >         "5306346": {
> > >
> > >             "status_stamp": "2023-08-07T17:35:56.884907+0000",
> > >
> > >             "last_beacon": "2023-08-07T17:45:01.903540+0000",
> > >
> > >             "status": {
> > >
> > >                 "status_json": "{\"1\":{\"name\":\"cephfs\",\"directory_count\":0,\"peers\":{}}}"
> > >
> > >             }
> > >
> > >         }
> > >
> > >
> > >
> > > Quincy secondary cluster
> > >
> > >
> > >
> > > root@a001s008-zz14l47008:/# ceph mgr module enable mirroring
> > >
> > > root@a001s008-zz14l47008:/# ceph fs authorize cephfs 
> > > client.mirror_remote / rwps
> > >
> > > [client.mirror_remote]
> > >
> > >         key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA==
> > >
> > > root@a001s008-zz14l47008:/# ceph auth get client.mirror_remote
> > >
> > > [client.mirror_remote]
> > >
> > >         key = AQCIFtFkI+SLNhAAWmez2DJpH9eGrbxA9efdoA==
> > >
> > >         caps mds = "allow rwps fsname=cephfs"
> > >
> > >         caps mon = "allow r fsname=cephfs"
> > >
> > >         caps osd = "allow rw tag cephfs data=cephfs"
> > >
> > > root@a001s008-zz14l47008:/#
> > >
> > > root@a001s008-zz14l47008:/# ceph fs snapshot mirror peer_bootstrap 
> > > create cephfs client.mirror_remote shgR-site
> > >
> > > {"token":
> > > "eyJmc2lkIjogIjJlYWMwZWEwLTYwNDgtNDQ0Zi04NGIyLThjZWVmZWQyN2E1YiIsI
> > > CJ
> > > ma
> > > Wxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3R
> > > lI
> > > iw
> > > gInNpdGVfbmFtZSI6ICJzaGdSLXNpdGUiLCAia2V5IjogIkFRQ0lGdEZrSStTTE5oQ
> > > UF
> > > Xb
> > > WV6MkRKcEg5ZUdyYnhBOWVmZG9BPT0iLCAibW9uX2hvc3QiOiAiW3YyOjEwLjIzOS4
> > > xN
> > > TU
> > > uMTg6MzMwMC8wLHYxOjEwLjIzOS4xNTUuMTg6Njc4OS8wXSBbdjI6MTAuMjM5LjE1N
> > > S4
> > > xO
> > > TozMzAwLzAsdjE6MTAuMjM5LjE1NS4xOTo2Nzg5LzBdIFt2MjoxMC4yMzkuMTU1LjI
> > > wO jM zMDAvMCx2MToxMC4yMzkuMTU1LjIwOjY3ODkvMF0ifQ=="}
> > >
> > > root@a001s008-zz14l47008:/#
> > >
> > >
> > >
> > > Thank you,
> > >
> > > Anantha
> > >
> > >
> > >
> > > From: Adiga, Anantha
> > > Sent: Friday, August 4, 2023 11:55 AM
> > > To: Venky Shankar <vshankar@xxxxxxxxxx>; ceph-users@xxxxxxx
> > > Subject: RE:  Re: cephfs snapshot mirror 
> > > peer_bootstrap import hung
> > >
> > >
> > >
> > > Hi Venky,
> > >
> > >
> > >
> > > Thank you so much for the guidance. Attached is the mgr log.
> > >
> > >
> > >
> > > Note: the 4th node in the primary cluster has smaller capacity  drives, the other 3 nodes have the larger capacity drives.
> > >
> > > 32    ssd    6.98630   1.00000  7.0 TiB   44 GiB   44 GiB   183 KiB  148 MiB  6.9 TiB  0.62  0.64   40      up          osd.32
> > >
> > > -7          76.84927         -   77 TiB  652 GiB  648 GiB    20 MiB  3.0 GiB   76 TiB  0.83  0.86    -              host fl31ca104ja0203
> > >
> > >   1    ssd    6.98630   1.00000  7.0 TiB   73 GiB   73 GiB   8.0 MiB  333 MiB  6.9 TiB  1.02  1.06   54      up          osd.1
> > >
> > >   4    ssd    6.98630   1.00000  7.0 TiB   77 GiB   77 GiB   1.1 MiB  174 MiB  6.9 TiB  1.07  1.11   55      up          osd.4
> > >
> > >   7    ssd    6.98630   1.00000  7.0 TiB   47 GiB   47 GiB   140 KiB  288 MiB  6.9 TiB  0.66  0.68   51      up          osd.7
> > >
> > > 10    ssd    6.98630   1.00000  7.0 TiB   75 GiB   75 GiB   299 KiB  278 MiB  6.9 TiB  1.05  1.09   44      up          osd.10
> > >
> > > 13    ssd    6.98630   1.00000  7.0 TiB   94 GiB   94 GiB  1018 KiB  291 MiB  6.9 TiB  1.31  1.36   72      up          osd.13
> > >
> > > 16    ssd    6.98630   1.00000  7.0 TiB   31 GiB   31 GiB   163 KiB  267 MiB  7.0 TiB  0.43  0.45   49      up          osd.16
> > >
> > > 19    ssd    6.98630   1.00000  7.0 TiB   14 GiB   14 GiB   756 KiB  333 MiB  7.0 TiB  0.20  0.21   50      up          osd.19
> > >
> > > 22    ssd    6.98630   1.00000  7.0 TiB  105 GiB  104 GiB   1.3 MiB  313 MiB  6.9 TiB  1.46  1.51   48      up          osd.22
> > >
> > > 25    ssd    6.98630   1.00000  7.0 TiB   17 GiB   16 GiB   257 KiB  272 MiB  7.0 TiB  0.23  0.24   45      up          osd.25
> > >
> > > 28    ssd    6.98630   1.00000  7.0 TiB   72 GiB   72 GiB   6.1 MiB  180 MiB  6.9 TiB  1.01  1.05   43      up          osd.28
> > >
> > > 31    ssd    6.98630   1.00000  7.0 TiB   47 GiB   46 GiB   592 KiB  358 MiB  6.9 TiB  0.65  0.68   56      up          osd.31
> > >
> > > -9          64.04089         -   64 TiB  728 GiB  726 GiB    17 MiB  1.8 GiB   63 TiB  1.11  1.15    -              host fl31ca104ja0302
> > >
> > > 33    ssd    5.82190   1.00000  5.8 TiB   65 GiB   65 GiB   245 KiB  144 MiB  5.8 TiB  1.09  1.13   47      up          osd.33
> > >
> > > 34    ssd    5.82190   1.00000  5.8 TiB   14 GiB   14 GiB   815 KiB   83 MiB  5.8 TiB  0.24  0.25   55      up          osd.34
> > >
> > > 35    ssd    5.82190   1.00000  5.8 TiB   77 GiB   77 GiB   224 KiB  213 MiB  5.7 TiB  1.30  1.34   44      up          osd.35
> > >
> > > 36    ssd    5.82190   1.00000  5.8 TiB  117 GiB  117 GiB   8.5 MiB  284 MiB  5.7 TiB  1.96  2.03   52      up          osd.36
> > >
> > > 37    ssd    5.82190   1.00000  5.8 TiB   58 GiB   58 GiB   501 KiB  132 MiB  5.8 TiB  0.98  1.01   40      up          osd.37
> > >
> > > 38    ssd    5.82190   1.00000  5.8 TiB  123 GiB  123 GiB   691 KiB  266 MiB  5.7 TiB  2.07  2.14   73      up          osd.38
> > >
> > > 39    ssd    5.82190   1.00000  5.8 TiB   77 GiB   77 GiB   609 KiB  193 MiB  5.7 TiB  1.30  1.34   62      up          osd.39
> > >
> > > 40    ssd    5.82190   1.00000  5.8 TiB   77 GiB   77 GiB   262 KiB  148 MiB  5.7 TiB  1.29  1.34   55      up          osd.40
> > >
> > > 41    ssd    5.82190   1.00000  5.8 TiB   44 GiB   44 GiB   4.4 MiB  140 MiB  5.8 TiB  0.75  0.77   44      up          osd.41
> > >
> > > 42    ssd    5.82190   1.00000  5.8 TiB   45 GiB   45 GiB   886 KiB  135 MiB  5.8 TiB  0.75  0.78   47      up          osd.42
> > >
> > > 43    ssd    5.82190   1.00000  5.8 TiB   28 GiB   28 GiB   187 KiB  104 MiB  5.8 TiB  0.48  0.49   58      up          osd.43
> > >
> > >
> > >
> > > [Also: Yesterday I had two cfs-mirror running one on 
> > > fl31ca104ja0201 and fl31ca104ja0302. The cfs-mirror on fl31ca104ja0201 was stopped.
> > > When the  import token was run on fl31ca104ja0302, the cfs-mirror 
> > > log was active. Just in case it is useful I have attached that log
> > > (cfsmirror-container.log) as well. ]
> > >
> > >
> > >
> > > How can I list the token on the target cluster after running the create peer_bootstrap command?
> > >
> > >
> > >
> > > Here is today’s status with your suggestion:
> > >
> > > There is only one cfs-mirror daemon running now. It is on fl31ca104ja0201 node.
> > >
> > >
> > >
> > > root@fl31ca104ja0201:/# ceph -s
> > >
> > >   cluster:
> > >
> > >     id:     d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
> > >
> > >     health: HEALTH_OK
> > >
> > >
> > >
> > >   services:
> > >
> > >     mon:           3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 7m)
> > >
> > >     mgr:           fl31ca104ja0201.kkoono(active, since 13m), standbys: fl31ca104ja0202, fl31ca104ja0203
> > >
> > >     mds:           1/1 daemons up, 2 standby
> > >
> > >     osd:           44 osds: 44 up (since 7m), 44 in (since 4w)
> > >
> > >     cephfs-mirror: 1 daemon active (1 hosts)
> > >
> > >     rgw:           3 daemons active (3 hosts, 1 zones)
> > >
> > >
> > >
> > >   data:
> > >
> > >     volumes: 1/1 healthy
> > >
> > >     pools:   25 pools, 769 pgs
> > >
> > >     objects: 614.40k objects, 1.9 TiB
> > >
> > >     usage:   2.8 TiB used, 292 TiB / 295 TiB avail
> > >
> > >     pgs:     769 active+clean
> > >
> > >
> > >
> > >   io:
> > >
> > >     client:   32 MiB/s rd, 0 B/s wr, 57 op/s rd, 1 op/s wr
> > >
> > >
> > >
> > > root@fl31ca104ja0201:/#
> > >
> > > root@fl31ca104ja0201:/#
> > >
> > > root@fl31ca104ja0201:/# ceph tell mgr.fl31ca104ja0201.kkoono 
> > > config set debug_mgr 20
> > >
> > > {
> > >
> > >     "success": ""
> > >
> > > }
> > >
> > > root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_bootstrap 
> > > import cephfs 
> > > eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsIC
> > > Jm
> > > aW
> > > xlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3Rl
> > > Ii
> > > wg
> > > InNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQk
> > > FB
> > > d1
> > > h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTgu
> > > NT
> > > Uu
> > > NzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNz
> > > M6 Mz MwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0=
> > >
> > > ^CInterrupted
> > >
> > >
> > >
> > > Ctrl-C after 15  min. Once the command is run, the health status goes to WARN .
> > >
> > >
> > >
> > > root@fl31ca104ja0201:/# ceph -s
> > >
> > >   cluster:
> > >
> > >     id:     d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
> > >
> > >     health: HEALTH_WARN
> > >
> > >             6 slow ops, oldest one blocked for 1095 sec,
> > > mon.fl31ca104ja0203 has slow ops
> > >
> > >
> > >
> > >   services:
> > >
> > >     mon:           3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 30m)
> > >
> > >     mgr:           fl31ca104ja0201.kkoono(active, since 35m), standbys: fl31ca104ja0202, fl31ca104ja0203
> > >
> > >     mds:           1/1 daemons up, 2 standby
> > >
> > >     osd:           44 osds: 44 up (since 29m), 44 in (since 4w)
> > >
> > >     cephfs-mirror: 1 daemon active (1 hosts)
> > >
> > >     rgw:           3 daemons active (3 hosts, 1 zones)
> > >
> > >
> > >
> > >   data:
> > >
> > >     volumes: 1/1 healthy
> > >
> > >     pools:   25 pools, 769 pgs
> > >
> > >     objects: 614.40k objects, 1.9 TiB
> > >
> > >     usage:   2.8 TiB used, 292 TiB / 295 TiB avail
> > >
> > >     pgs:     769 active+clean
> > >
> > >
> > >
> > >   io:
> > >
> > >     client:   67 KiB/s rd, 0 B/s wr, 68 op/s rd, 21 op/s wr
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Venky Shankar <vshankar@xxxxxxxxxx>
> > > Sent: Thursday, August 3, 2023 11:03 PM
> > > To: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
> > > Cc: ceph-users@xxxxxxx
> > > Subject:  Re: cephfs snapshot mirror peer_bootstrap 
> > > import hung
> > >
> > >
> > >
> > > Hi Anantha,
> > >
> > >
> > >
> > > On Fri, Aug 4, 2023 at 2:27 AM Adiga, Anantha <anantha.adiga@xxxxxxxxx> wrote:
> > >
> > > >
> > >
> > > > Hi
> > >
> > > >
> > >
> > > > Could you please  provide guidance on how to diagnose this issue:
> > >
> > > >
> > >
> > > > In this case, there are two  Ceph clusters: cluster A, 4 nodes and cluster B, 3 node, in different locations.  Both are already running RGW multi-site,  A is master.
> > >
> > > >
> > >
> > > > Cephfs snapshot mirroring is being configured on the clusters.  Cluster A  is the primary, cluster B is the peer. Cephfs snapshot mirroring is being configured. The bootstrap import  step on the primary node hangs.
> > >
> > > >
> > >
> > > > On the target cluster :
> > >
> > > > ---------------------------
> > >
> > > > "version": "16.2.5",
> > >
> > > >     "release": "pacific",
> > >
> > > >     "release_type": "stable"
> > >
> > > >
> > >
> > > > root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap 
> > > > create
> > >
> > > > cephfs client.mirror_remote flex2-site
> > >
> > > > {"token":
> > >
> > > > "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSI
> > > > sI
> > > > CJ
> > > > ma
> > >
> > > > Wxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb
> > > > 3R
> > > > lI
> > > > iw
> > >
> > > > gInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHB
> > > > MQ
> > > > kF
> > > > Bd
> > >
> > > > 1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuM
> > > > Tg
> > > > uN
> > > > TU
> > >
> > > > uNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTU
> > > > uN
> > > > zM
> > > > 6M
> > >
> > > > zMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="}
> > >
> > >
> > >
> > > Seems fine uptil here.
> > >
> > >
> > >
> > > > root@cr21meg16ba0101:/var/run/ceph#
> > >
> > > >
> > >
> > > > On the source cluster:
> > >
> > > > ----------------------------
> > >
> > > > "version": "17.2.6",
> > >
> > > >     "release": "quincy",
> > >
> > > >     "release_type": "stable"
> > >
> > > >
> > >
> > > > root@fl31ca104ja0201:/# ceph -s
> > >
> > > >   cluster:
> > >
> > > >     id:     d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
> > >
> > > >     health: HEALTH_OK
> > >
> > > >
> > >
> > > >   services:
> > >
> > > >     mon:           3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m)
> > >
> > > >     mgr:           fl31ca104ja0201.nwpqlh(active, since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202
> > >
> > > >     mds:           1/1 daemons up, 2 standby
> > >
> > > >     osd:           44 osds: 44 up (since 111m), 44 in (since 4w)
> > >
> > > >     cephfs-mirror: 1 daemon active (1 hosts)
> > >
> > > >     rgw:           3 daemons active (3 hosts, 1 zones)
> > >
> > > >
> > >
> > > >   data:
> > >
> > > >     volumes: 1/1 healthy
> > >
> > > >     pools:   25 pools, 769 pgs
> > >
> > > >     objects: 614.40k objects, 1.9 TiB
> > >
> > > >     usage:   2.8 TiB used, 292 TiB / 295 TiB avail
> > >
> > > >     pgs:     769 active+clean
> > >
> > > >
> > >
> > > > root@fl31ca104ja0302:/# ceph mgr module enable mirroring module
> > >
> > > > 'mirroring' is already enabled root@fl31ca104ja0302:/# ceph fs
> > >
> > > > snapshot mirror peer_bootstrap import cephfs
> > >
> > > > eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIs
> > > > IC
> > > > Jm
> > > > aW
> > >
> > > > xlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3
> > > > Rl
> > > > Ii
> > > > wg
> > >
> > > > InNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBM
> > > > Qk
> > > > FB
> > > > d1
> > >
> > > > h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMT
> > > > gu
> > > > NT
> > > > Uu
> > >
> > > > NzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUu
> > > > Nz
> > > > M6
> > > > Mz
> > >
> > > > MwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0=
> > >
> > >
> > >
> > > Going by your description, I'm guessing this is the command that 
> > > hangs? If that's the case, set `debug_mgr=20`, repeat the token 
> > > import step and share the ceph-mgr log. Also note that you can 
> > > check the mirror daemon status as detailed in
> > >
> > >
> > >
> > >
> > > https://docs.ceph.com/en/latest/dev/cephfs-mirroring/#mirror-daemo
> > > n-
> > > st
> > > atus
> > >
> > >
> > >
> > > >
> > >
> > > >
> > >
> > > > root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon
> > >
> > > > /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7
> > > > .9
> > > > 40
> > > > 83135960976.asok status {
> > >
> > > >     "metadata": {
> > >
> > > >         "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5",
> > >
> > > >         "ceph_version": "ceph version 17.2.6
> > > > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)",
> > >
> > > >         "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt",
> > >
> > > >         "hostname": "fl31ca104ja0302",
> > >
> > > >         "pid": "7",
> > >
> > > >         "root": "/"
> > >
> > > >     },
> > >
> > > >     "dentry_count": 0,
> > >
> > > >     "dentry_pinned_count": 0,
> > >
> > > >     "id": 5194553,
> > >
> > > >     "inst": {
> > >
> > > >         "name": {
> > >
> > > >             "type": "client",
> > >
> > > >             "num": 5194553
> > >
> > > >         },
> > >
> > > >         "addr": {
> > >
> > > >             "type": "v1",
> > >
> > > >             "addr": "10.45.129.5:0",
> > >
> > > >             "nonce": 2497002034
> > >
> > > >         }
> > >
> > > >     },
> > >
> > > >     "addr": {
> > >
> > > >         "type": "v1",
> > >
> > > >         "addr": "10.45.129.5:0",
> > >
> > > >         "nonce": 2497002034
> > >
> > > >     },
> > >
> > > >     "inst_str": "client.5194553 10.45.129.5:0/2497002034",
> > >
> > > >     "addr_str": "10.45.129.5:0/2497002034",
> > >
> > > >     "inode_count": 1,
> > >
> > > >     "mds_epoch": 118,
> > >
> > > >     "osd_epoch": 6266,
> > >
> > > >     "osd_epoch_barrier": 0,
> > >
> > > >     "blocklisted": false,
> > >
> > > >     "fs_name": "cephfs"
> > >
> > > > }
> > >
> > > >
> > >
> > > > root@fl31ca104ja0302:/home/general# docker logs
> > >
> > > > ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca10
> > > > 4j
> > > > a0
> > > > 30
> > >
> > > > 2-sypagt --tail  10 debug 2023-08-03T05:24:27.413+0000
> > > > 7f8eb6fc0280
> > > > 0
> > >
> > > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
> > > > quincy
> > >
> > > > (stable), process cephfs-mirror, pid 7 debug
> > >
> > > > 2023-08-03T05:24:27.413+0000 7f8eb6fc0280  0 pidfile_write: 
> > > > ignore
> > >
> > > > empty --pid-file debug 2023-08-03T05:24:27.445+0000 7f8eb6fc0280
> > > > 1
> > >
> > > > mgrc service_daemon_register cephfs-mirror.5184622 metadata
> > >
> > > > {arch=x86_64,ceph_release=quincy,ceph_version=ceph version 
> > > > 17.2.6
> > >
> > > > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
> > >
> > > > (stable),ceph_version_short=17.2.6,container_hostname=fl31ca104j
> > > > a0
> > > > 30
> > > > 2,
> > >
> > > > container_image=quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2
> > > > d1
> > > > 8a
> > > > 9c
> > >
> > > > 64ca62a0b38ab362e614ad671efa4a0547e,cpu=Intel(R) Xeon(R) Gold 
> > > > 6252 CPU
> > >
> > > > @ 2.10GHz,distro=centos,distro_description=CentOS Stream
> > >
> > > > 8,distro_version=8,hostname=fl31ca104ja0302,id=fl31ca104ja0302.s
> > > > yp
> > > > ag
> > > > t,
> > >
> > > > instance_id=5184622,kernel_description=#82-Ubuntu SMP Tue Jun 6
> > >
> > > > 23:10:23 UTC
> > >
> > > > 2023,kernel_version=5.15.0-75-generic,mem_swap_kb=8388604,mem_to
> > > > ta
> > > > l_
> > > > kb
> > >
> > > > =527946928,os=Linux} debug 2023-08-03T05:27:10.419+0000
> > > > 7f8ea1b2c700
> > >
> > > > 0 client.5194553 ms_handle_reset on v2:10.45.128.141:3300/0 
> > > > debug
> > >
> > > > 2023-08-03T05:50:10.917+0000 7f8ea1b2c700  0 client.5194553
> > >
> > > > ms_handle_reset on v2:10.45.128.139:3300/0
> > >
> > > >
> > >
> > > > Thank you,
> > >
> > > > Anantha
> > >
> > > > _______________________________________________
> > >
> > > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe 
> > > > send an
> > >
> > > > email to ceph-users-leave@xxxxxxx
> > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Cheers,
> > >
> > > Venky
> > >
> > > _______________________________________________
> > >
> > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send 
> > > an email to ceph-users-leave@xxxxxxx
> >
> >
> >
> > --
> > Cheers,
> > Venky
> >
>
>
> --
> Cheers,
> Venky
>


--
Cheers,
Venky

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux