Re: MDS stuck in "up:replay"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi again,

Another thing I found: Out of pure desperation, I started MDS on all
nodes. I had them configured in the past so I was hoping, they could
help with bringing in missing data even when they were down for quite a
while now. I didn't see any changes in the logs but the CPU on the hosts
that usually don't run MDS just spiked. So high I had to kill the MDS
again because otherwise they kept killing OSD containers. So I don't
really have any new information, but maybe that could be a hint of some
kind?

Cheers,
Thomas

On 17.01.23 10:13, Thomas Widhalm wrote:
Hi,

Thanks again. :-)

Ok, that seems like an error to me. I never configured an extra rank for
MDS. Maybe that's where my knowledge failed me but I guess, MDS is
waiting for something that was never there.

Yes, there are two filesystems. Due to "budget restrictions" (it's my
personal system at home, I configured a second CephFS with only one
replica for data that could be easily restored.

Here's what I got when turning up the debug level:

Jan 17 10:08:17 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:17 ceph05 ceph-mds[1209]: mds.beacon.mds01.ceph05.pqxmvt
Sending beacon up:replay seq 11107
Jan 17 10:08:17 ceph05 ceph-mds[1209]: mds.beacon.mds01.ceph05.pqxmvt
sender thread waiting interval 4s
Jan 17 10:08:17 ceph05 ceph-mds[1209]: mds.beacon.mds01.ceph05.pqxmvt
received beacon reply up:replay seq 11107 rtt 0.00200002
Jan 17 10:08:17 ceph05 ceph-mds[1209]: mds.0.158167 get_task_status
Jan 17 10:08:17 ceph05 ceph-mds[1209]: mds.0.158167
schedule_update_timer_task
Jan 17 10:08:18 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57628, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:18 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:18 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:19 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57628, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:19 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:19 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:19 ceph05 ceph-mds[1209]: mds.0.158167 get_task_status
Jan 17 10:08:19 ceph05 ceph-mds[1209]: mds.0.158167
schedule_update_timer_task
Jan 17 10:08:20 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57628, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:20 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:20 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:21 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57628, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:21 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:21 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:21 ceph05 ceph-mds[1209]: mds.beacon.mds01.ceph05.pqxmvt
Sending beacon up:replay seq 11108
Jan 17 10:08:21 ceph05 ceph-mds[1209]: mds.beacon.mds01.ceph05.pqxmvt
sender thread waiting interval 4s
Jan 17 10:08:21 ceph05 ceph-mds[1209]: mds.beacon.mds01.ceph05.pqxmvt
received beacon reply up:replay seq 11108 rtt 0.00200002
Jan 17 10:08:21 ceph05 ceph-mds[1209]: mds.0.158167 get_task_status
Jan 17 10:08:21 ceph05 ceph-mds[1209]: mds.0.158167
schedule_update_timer_task
Jan 17 10:08:22 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57628, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:22 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:22 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:23 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57628, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:23 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:23 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:23 ceph05 ceph-mds[1209]: mds.0.158167 get_task_status
Jan 17 10:08:23 ceph05 ceph-mds[1209]: mds.0.158167
schedule_update_timer_task
Jan 17 10:08:24 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57628, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:24 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:24 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:25 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57628, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:25 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:25 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:25 ceph05 ceph-mds[1209]: mds.beacon.mds01.ceph05.pqxmvt
Sending beacon up:replay seq 11109
Jan 17 10:08:25 ceph05 ceph-mds[1209]: mds.beacon.mds01.ceph05.pqxmvt
sender thread waiting interval 4s
Jan 17 10:08:25 ceph05 ceph-mds[1209]: mds.beacon.mds01.ceph05.pqxmvt
received beacon reply up:replay seq 11109 rtt 0.00600006
Jan 17 10:08:25 ceph05 ceph-mds[1209]: mds.0.158167 get_task_status
Jan 17 10:08:25 ceph05 ceph-mds[1209]: mds.0.158167
schedule_update_timer_task
Jan 17 10:08:26 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57344, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:26 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:26 ceph05 ceph-mds[1209]: mds.0.cache releasing free memory
Jan 17 10:08:26 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:27 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57272, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:27 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:27 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s
Jan 17 10:08:27 ceph05 ceph-mds[1209]: mds.0.158167 get_task_status
Jan 17 10:08:27 ceph05 ceph-mds[1209]: mds.0.158167
schedule_update_timer_task
Jan 17 10:08:28 ceph05 ceph-mds[1209]: mds.0.cache Memory usage:  total
372640, rss 57040, heap 207124, baseline 182548, 0 / 3 inodes have caps,
0 caps, 0 caps per inode
Jan 17 10:08:28 ceph05 ceph-mds[1209]: mds.0.cache cache not ready for
trimming
Jan 17 10:08:28 ceph05 ceph-mds[1209]: mds.0.cache upkeep thread waiting
interval 1.000000000s


The only thing that gives me hope here is that the line
mds.beacon.mds01.ceph05.pqxmvt Sending beacon up:replay seq 11109 is
chaning its sequence number.

Anything else I can provide?

Cheers,
Thomas

On 17.01.23 06:27, Kotresh Hiremath Ravishankar wrote:
Hi Thomas,

Sorry, I misread the mds state to be stuck in 'up:resolve' state. The
mds is stuck in 'up:replay' which means the MDS taking over a failed
rank.
This state represents that the MDS is recovering its journal and other
metadata.

I notice that there are two filesystems 'cephfs' and 'cephfs_insecure'
and the active mds for both filesystems are stuck in 'up:replay'. The mds
logs shared are not providing much information to infer anything.

Could you please enable the debug logs and pass on the mds logs ?

Thanks,
Kotresh H R

On Mon, Jan 16, 2023 at 2:38 PM Thomas Widhalm
<thomas.widhalm@xxxxxxxxxx <mailto:thomas.widhalm@xxxxxxxxxx>> wrote:

    Hi Kotresh,

    Thanks for your reply!

    I only have one rank. Here's the output of all MDS I have:

    ###################

    [ceph: root@ceph06 /]# ceph tell mds.mds01.ceph05.pqxmvt status
    2023-01-16T08:55:26.055+0000 7f3412ffd700  0 client.61249926
    ms_handle_reset on v2:192.168.23.65:6800/2680651694
    <http://192.168.23.65:6800/2680651694>
    2023-01-16T08:55:26.084+0000 7f3412ffd700  0 client.61299199
    ms_handle_reset on v2:192.168.23.65:6800/2680651694
    <http://192.168.23.65:6800/2680651694>
    {
          "cluster_fsid": "ff6e50de-ed72-11ec-881c-dca6325c2cc4",
          "whoami": 0,
          "id": 60984167,
          "want_state": "up:replay",
          "state": "up:replay",
          "fs_name": "cephfs",
          "replay_status": {
              "journal_read_pos": 0,
              "journal_write_pos": 0,
              "journal_expire_pos": 0,
              "num_events": 0,
              "num_segments": 0
          },
          "rank_uptime": 150224.982558844,
          "mdsmap_epoch": 143757,
          "osdmap_epoch": 12395,
          "osdmap_epoch_barrier": 0,
          "uptime": 150225.39968057699
    }

    ########################

    [ceph: root@ceph06 /]# ceph tell mds.mds01.ceph04.cvdhsx status
    2023-01-16T08:59:05.434+0000 7fdb82ff5700  0 client.61299598
    ms_handle_reset on v2:192.168.23.64:6800/3930607515
    <http://192.168.23.64:6800/3930607515>
    2023-01-16T08:59:05.466+0000 7fdb82ff5700  0 client.61299604
    ms_handle_reset on v2:192.168.23.64:6800/3930607515
    <http://192.168.23.64:6800/3930607515>
    {
          "cluster_fsid": "ff6e50de-ed72-11ec-881c-dca6325c2cc4",
          "whoami": 0,
          "id": 60984134,
          "want_state": "up:replay",
          "state": "up:replay",
          "fs_name": "cephfs_insecure",
          "replay_status": {
              "journal_read_pos": 0,
              "journal_write_pos": 0,
              "journal_expire_pos": 0,
              "num_events": 0,
              "num_segments": 0
          },
          "rank_uptime": 150450.96934037199,
          "mdsmap_epoch": 143815,
          "osdmap_epoch": 12395,
          "osdmap_epoch_barrier": 0,
          "uptime": 150451.93533502301
    }

    ###########################

    [ceph: root@ceph06 /]# ceph tell mds.mds01.ceph06.wcfdom status
    2023-01-16T08:59:28.572+0000 7f16538c0b80 -1 client.61250376
    resolve_mds: no MDS daemons found by name `mds01.ceph06.wcfdom'
    2023-01-16T08:59:28.583+0000 7f16538c0b80 -1 client.61250376 FSMap:
    cephfs:1/1 cephfs_insecure:1/1

{cephfs:0=mds01.ceph05.pqxmvt=up:replay,cephfs_insecure:0=mds01.ceph04.cvdhsx=up:replay}
    2 up:standby
    Error ENOENT: problem getting command descriptions from
    mds.mds01.ceph06.wcfdom

    ############################

    [ceph: root@ceph06 /]# ceph tell mds.mds01.ceph07.omdisd status
    2023-01-16T09:00:02.802+0000 7fb7affff700  0 client.61250454
    ms_handle_reset on v2:192.168.23.67:6800/942898192
    <http://192.168.23.67:6800/942898192>
    2023-01-16T09:00:02.831+0000 7fb7affff700  0 client.61299751
    ms_handle_reset on v2:192.168.23.67:6800/942898192
    <http://192.168.23.67:6800/942898192>
    {
          "cluster_fsid": "ff6e50de-ed72-11ec-881c-dca6325c2cc4",
          "whoami": -1,
          "id": 60984161,
          "want_state": "up:standby",
          "state": "up:standby",
          "mdsmap_epoch": 97687,
          "osdmap_epoch": 0,
          "osdmap_epoch_barrier": 0,
          "uptime": 150508.29091721401
    }

    The error message from ceph06 is new to me. That didn't happen the
last
    times.

    [ceph: root@ceph06 /]# ceph fs dump
    e143850
    enable_multiple, ever_enabled_multiple: 1,1
    default compat: compat={},rocompat={},incompat={1=base v0.20,2=client
    writeable ranges,3=default file layouts on dirs,4=dir inode in
separate
    object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
    anchor table,9=file layout v2,10=snaprealm v2}
    legacy client fscid: 2

    Filesystem 'cephfs' (2)
    fs_name cephfs
    epoch   143850
    flags   12 joinable allow_snaps allow_multimds_snaps
    created 2023-01-14T14:30:05.723421+0000
    modified        2023-01-16T09:00:53.663007+0000
    tableserver     0
    root    0
    session_timeout 60
    session_autoclose       300
    max_file_size   1099511627776
    required_client_features        {}
    last_failure    0
    last_failure_osd_epoch  12321
    compat  compat={},rocompat={},incompat={1=base v0.20,2=client
writeable
    ranges,3=default file layouts on dirs,4=dir inode in separate
    object,5=mds uses versioned encoding,6=dirfrag is stored in
omap,7=mds
    uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
    max_mds 1
    in      0
    up      {0=60984167}
    failed
    damaged
    stopped
    data_pools      [4]
    metadata_pool   5
    inline_data     disabled
    balancer
    standby_count_wanted    1
    [mds.mds01.ceph05.pqxmvt{0:60984167} state up:replay seq 37637 addr
    [v2:192.168.23.65:6800/2680651694,v1:192.168.23.65:6801/2680651694

<http://192.168.23.65:6800/2680651694,v1:192.168.23.65:6801/2680651694>]
    compat {c=[1],r=[1],i=[7ff]}]


    Filesystem 'cephfs_insecure' (3)
    fs_name cephfs_insecure
    epoch   143849
    flags   12 joinable allow_snaps allow_multimds_snaps
    created 2023-01-14T14:22:46.360062+0000
    modified        2023-01-16T09:00:52.632163+0000
    tableserver     0
    root    0
    session_timeout 60
    session_autoclose       300
    max_file_size   1099511627776
    required_client_features        {}
    last_failure    0
    last_failure_osd_epoch  12319
    compat  compat={},rocompat={},incompat={1=base v0.20,2=client
writeable
    ranges,3=default file layouts on dirs,4=dir inode in separate
    object,5=mds uses versioned encoding,6=dirfrag is stored in
omap,7=mds
    uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
    max_mds 1
    in      0
    up      {0=60984134}
    failed
    damaged
    stopped
    data_pools      [7]
    metadata_pool   6
    inline_data     disabled
    balancer
    standby_count_wanted    1
    [mds.mds01.ceph04.cvdhsx{0:60984134} state up:replay seq 37639 addr
    [v2:192.168.23.64:6800/3930607515,v1:192.168.23.64:6801/3930607515

<http://192.168.23.64:6800/3930607515,v1:192.168.23.64:6801/3930607515>]
    compat {c=[1],r=[1],i=[7ff]}]


    Standby daemons:

    [mds.mds01.ceph07.omdisd{-1:60984161} state up:standby seq 2 addr
    [v2:192.168.23.67:6800/942898192,v1:192.168.23.67:6800/942898192

<http://192.168.23.67:6800/942898192,v1:192.168.23.67:6800/942898192>]
compat
    {c=[1],r=[1],i=[7ff]}]
    [mds.mds01.ceph06.hsuhqd{-1:60984828} state up:standby seq 1 addr
    [v2:192.168.23.66:6800/4259514518,v1:192.168.23.66:6801/4259514518

<http://192.168.23.66:6800/4259514518,v1:192.168.23.66:6801/4259514518>]
    compat {c=[1],r=[1],i=[7ff]}]
    dumped fsmap epoch 143850

    #############################

    [ceph: root@ceph06 /]# ceph fs status

    (doesn't come back)

    #############################

    All MDS show log lines similar to this one:

    Jan 16 10:05:00 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143927 from mon.1
    Jan 16 10:05:05 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143929 from mon.1
    Jan 16 10:05:09 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143930 from mon.1
    Jan 16 10:05:13 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143931 from mon.1
    Jan 16 10:05:20 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143933 from mon.1
    Jan 16 10:05:24 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143935 from mon.1
    Jan 16 10:05:29 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143936 from mon.1
    Jan 16 10:05:33 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143937 from mon.1
    Jan 16 10:05:40 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143939 from mon.1
    Jan 16 10:05:44 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143941 from mon.1
    Jan 16 10:05:49 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx
Updating
    MDS map to version 143942 from mon.1

    Anything else, I can provide?

    Cheers and thanks again!
    Thomas

    On 16.01.23 06:01, Kotresh Hiremath Ravishankar wrote:
     > Hi Thomas,
     >
     > As the documentation says, the MDS enters up:resolve from
    |up:replay| if
     > the Ceph file system has multiple ranks (including this one),
    i.e. it’s
     > not a single active MDS cluster.
     > The MDS is resolving any uncommitted inter-MDS operations. All
    ranks in
     > the file system must be in this state or later for progress to be
    made,
     > i.e. no rank can be failed/damaged or |up:replay|.
     >
     > So please check the status of the other active mds if it's failed.
     >
     > Also please share the mds logs and the output of 'ceph fs dump'
and
     > 'ceph fs status'
     >
     > Thanks,
     > Kotresh H R
     >
     > On Sat, Jan 14, 2023 at 9:07 PM Thomas Widhalm
     > <thomas.widhalm@xxxxxxxxxx <mailto:thomas.widhalm@xxxxxxxxxx>
    <mailto:thomas.widhalm@xxxxxxxxxx
    <mailto:thomas.widhalm@xxxxxxxxxx>>> wrote:
     >
     >     Hi,
     >
     >     I'm really lost with my Ceph system. I built a small cluster
    for home
     >     usage which has two uses for me: I want to replace an old NAS
    and I want
     >     to learn about Ceph so that I have hands-on experience. We're
    using it
     >     in our company but I need some real-life experience without
    risking any
     >     company or customers data. That's my preferred way of
learning.
     >
     >     The cluster consists of 3 Raspberry Pis plus a few VMs
running on
     >     Proxmox. I'm not using Proxmox' built in Ceph because I want
    to focus on
     >     Ceph and not just use it as a preconfigured tool.
     >
     >     All hosts are running Fedora (x86_64 and arm64) and during an
    Upgrade
     >     from F36 to F37 my cluster suddenly showed all PGs as
    unavailable. I
     >     worked nearly a week to get it back online and I learned a
    lot about
     >     Ceph management and recovery. The cluster is back but I still
    can't
     >     access my data. Maybe you can help me?
     >
     >     Here are my versions:
     >
     >     [ceph: root@ceph04 /]# ceph versions
     >     {
     >           "mon": {
     >               "ceph version 17.2.5
     >     (98318ae89f1a893a6ded3a640405cdbb33e08757)
     >     quincy (stable)": 3
     >           },
     >           "mgr": {
     >               "ceph version 17.2.5
     >     (98318ae89f1a893a6ded3a640405cdbb33e08757)
     >     quincy (stable)": 3
     >           },
     >           "osd": {
     >               "ceph version 17.2.5
     >     (98318ae89f1a893a6ded3a640405cdbb33e08757)
     >     quincy (stable)": 5
     >           },
     >           "mds": {
     >               "ceph version 17.2.5
     >     (98318ae89f1a893a6ded3a640405cdbb33e08757)
     >     quincy (stable)": 4
     >           },
     >           "overall": {
     >               "ceph version 17.2.5
     >     (98318ae89f1a893a6ded3a640405cdbb33e08757)
     >     quincy (stable)": 15
     >           }
     >     }
     >
     >
     >     Here's MDS status output of one MDS:
     >     [ceph: root@ceph04 /]# ceph tell mds.mds01.ceph05.pqxmvt
status
     >     2023-01-14T15:30:28.607+0000 7fb9e17fa700  0 client.60986454
     >     ms_handle_reset on v2:192.168.23.65:6800/2680651694
    <http://192.168.23.65:6800/2680651694>
     >     <http://192.168.23.65:6800/2680651694
    <http://192.168.23.65:6800/2680651694>>
     >     2023-01-14T15:30:28.640+0000 7fb9e17fa700  0 client.60986460
     >     ms_handle_reset on v2:192.168.23.65:6800/2680651694
    <http://192.168.23.65:6800/2680651694>
     >     <http://192.168.23.65:6800/2680651694
    <http://192.168.23.65:6800/2680651694>>
     >     {
     >           "cluster_fsid": "ff6e50de-ed72-11ec-881c-dca6325c2cc4",
     >           "whoami": 0,
     >           "id": 60984167,
     >           "want_state": "up:replay",
     >           "state": "up:replay",
     >           "fs_name": "cephfs",
     >           "replay_status": {
     >               "journal_read_pos": 0,
     >               "journal_write_pos": 0,
     >               "journal_expire_pos": 0,
     >               "num_events": 0,
     >               "num_segments": 0
     >           },
     >           "rank_uptime": 1127.54018615,
     >           "mdsmap_epoch": 98056,
     >           "osdmap_epoch": 12362,
     >           "osdmap_epoch_barrier": 0,
     >           "uptime": 1127.957307273
     >     }
     >
     >     It's staying like that for days now. If there was a counter
    moving, I
     >     just would wait but it doesn't change anything and alle stats
    says, the
     >     MDS aren't working at all.
     >
     >     The symptom I have is that Dashboard and all other tools I
    use say, it's
     >     more or less ok. (Some old messages about failed daemons and
    scrubbing
     >     aside). But I can't mount anything. When I try to start a VM
    that's on
     >     RDS I just get a timeout. And when I try to mount a CephFS,
    mount just
     >     hangs forever.
     >
     >     Whatever command I give MDS or journal, it just hangs. The
    only thing I
     >     could do, was take all CephFS offline, kill the MDS's and do
    a "ceph fs
     >     reset <fs name> --yes-i-really-mean-it". After that I
    rebooted all
     >     nodes, just to be sure but I still have no access to data.
     >
     >     Could you please help me? I'm kinda desperate. If you need
    any more
     >     information, just let me know.
     >
     >     Cheers,
     >     Thomas
     >
     >     -- 
     >     Thomas Widhalm
     >     Lead Systems Engineer
     >
     >     NETWAYS Professional Services GmbH | Deutschherrnstr. 15-19 |
     >     D-90429 Nuernberg
     >     Tel: +49 911 92885-0 | Fax: +49 911 92885-77
     >     CEO: Julian Hein, Bernd Erk | AG Nuernberg HRB34510
     > https://www.netways.de <https://www.netways.de>
    <https://www.netways.de <https://www.netways.de>> |
     > thomas.widhalm@xxxxxxxxxx <mailto:thomas.widhalm@xxxxxxxxxx>
    <mailto:thomas.widhalm@xxxxxxxxxx <mailto:thomas.widhalm@xxxxxxxxxx>>
     >
     >     ** stackconf 2023 - September - https://stackconf.eu
    <https://stackconf.eu>
     >     <https://stackconf.eu <https://stackconf.eu>> **
     >     ** OSMC 2023 - November - https://osmc.de <https://osmc.de>
    <https://osmc.de <https://osmc.de>> **
     >     ** New at NWS: Managed Database -
     > https://nws.netways.de/managed-database
    <https://nws.netways.de/managed-database>
     >     <https://nws.netways.de/managed-database
    <https://nws.netways.de/managed-database>> **
     >     ** NETWAYS Web Services - https://nws.netways.de
    <https://nws.netways.de>
     >     <https://nws.netways.de <https://nws.netways.de>> **
     >     _______________________________________________
     >     ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
     >     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
     >     To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>
     >     <mailto:ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>>
     >

    -- 
    Thomas Widhalm
    Lead Systems Engineer

    NETWAYS Professional Services GmbH | Deutschherrnstr. 15-19 |
    D-90429 Nuernberg
    Tel: +49 911 92885-0 | Fax: +49 911 92885-77
    CEO: Julian Hein, Bernd Erk | AG Nuernberg HRB34510
    https://www.netways.de <https://www.netways.de> |
    thomas.widhalm@xxxxxxxxxx <mailto:thomas.widhalm@xxxxxxxxxx>

    ** stackconf 2023 - September - https://stackconf.eu
    <https://stackconf.eu> **
    ** OSMC 2023 - November - https://osmc.de <https://osmc.de> **
    ** New at NWS: Managed Database -
    https://nws.netways.de/managed-database
    <https://nws.netways.de/managed-database> **
    ** NETWAYS Web Services - https://nws.netways.de
    <https://nws.netways.de> **


-- 
Thomas Widhalm
Lead Systems Engineer

NETWAYS Professional Services GmbH | Deutschherrnstr. 15-19 | D-90429
Nuernberg
Tel: +49 911 92885-0 | Fax: +49 911 92885-77
CEO: Julian Hein, Bernd Erk | AG Nuernberg HRB34510
https://www.netways.de | thomas.widhalm@xxxxxxxxxx

** stackconf 2023 - September - https://stackconf.eu **
** OSMC 2023 - November - https://osmc.de **
** New at NWS: Managed Database -
https://nws.netways.de/managed-database **
** NETWAYS Web Services - https://nws.netways.de **
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Thomas Widhalm
Lead Systems Engineer

NETWAYS Professional Services GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg
Tel: +49 911 92885-0 | Fax: +49 911 92885-77
CEO: Julian Hein, Bernd Erk | AG Nuernberg HRB34510
https://www.netways.de | thomas.widhalm@xxxxxxxxxx

** stackconf 2023 - September - https://stackconf.eu **
** OSMC 2023 - November - https://osmc.de **
** New at NWS: Managed Database - https://nws.netways.de/managed-database **
** NETWAYS Web Services - https://nws.netways.de **
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux