Hi, I have a rather small cephfs cluster with 3 machines right now: all of them sharing MDS, MON, MGS and OSD roles. I had to move all machines to a new physical location and, unfortunately, I had to move all of them at the same time. They are already on again but ceph won't be accessible as all pgs are in peering state and OSD keep going down and up again. Here is some info about my cluster: ------------------------------------------- # ceph -s cluster: id: e348b63c-d239-4a15-a2ce-32f29a00431c health: HEALTH_WARN 1 filesystem is degraded 1 MDSs report slow metadata IOs 2 osds down 1 host (2 osds) down Reduced data availability: 324 pgs inactive, 324 pgs peering 7 daemons have recently crashed 10 slow ops, oldest one blocked for 206 sec, mon.a2-df has slow ops services: mon: 3 daemons, quorum a2-df,a3-df,a1-df (age 47m) mgr: a2-df(active, since 82m), standbys: a3-df, a1-df mds: cephfs:1/1 {0=a2-df=up:replay} 2 up:standby osd: 6 osds: 4 up (since 5s), 6 in (since 47m) rgw: 1 daemon active (a2-df) data: pools: 7 pools, 324 pgs objects: 850.25k objects, 744 GiB usage: 2.3 TiB used, 14 TiB / 16 TiB avail pgs: 100.000% pgs not active 324 peering ------------------------------------------- ------------------------------------------- # ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 16.37366 - 16 TiB 2.3 TiB 2.3 TiB 1.1 GiB 8.1 GiB 14 TiB 13.83 1.00 - root default -10 16.37366 - 16 TiB 2.3 TiB 2.3 TiB 1.1 GiB 8.1 GiB 14 TiB 13.83 1.00 - datacenter df -3 5.45799 - 5.5 TiB 773 GiB 770 GiB 382 MiB 2.7 GiB 4.7 TiB 13.83 1.00 - host a1-df 3 hdd-slow 3.63899 1.00000 3.6 TiB 1.1 GiB 90 MiB 0 B 1 GiB 3.6 TiB 0.03 0.00 0 down osd.3 0 hdd 1.81898 1.00000 1.8 TiB 772 GiB 770 GiB 382 MiB 1.7 GiB 1.1 TiB 41.43 3.00 0 down osd.0 -5 5.45799 - 5.5 TiB 773 GiB 770 GiB 370 MiB 2.7 GiB 4.7 TiB 13.83 1.00 - host a2-df 4 hdd-slow 3.63899 1.00000 3.6 TiB 1.1 GiB 90 MiB 0 B 1 GiB 3.6 TiB 0.03 0.00 100 up osd.4 1 hdd 1.81898 1.00000 1.8 TiB 772 GiB 770 GiB 370 MiB 1.7 GiB 1.1 TiB 41.42 3.00 224 up osd.1 -7 5.45767 - 5.5 TiB 773 GiB 770 GiB 387 MiB 2.7 GiB 4.7 TiB 13.83 1.00 - host a3-df 5 hdd-slow 3.63869 1.00000 3.6 TiB 1.1 GiB 90 MiB 0 B 1 GiB 3.6 TiB 0.03 0.00 100 up osd.5 2 hdd 1.81898 1.00000 1.8 TiB 772 GiB 770 GiB 387 MiB 1.7 GiB 1.1 TiB 41.43 3.00 224 up osd.2 TOTAL 16 TiB 2.3 TiB 2.3 TiB 1.1 GiB 8.1 GiB 14 TiB 13.83 MIN/MAX VAR: 0.00/3.00 STDDEV: 21.82 ------------------------------------------- At this exact moment both OSDs from server a1-df were down but that's changing. Sometimes I have only one OSD down, but most of the times I have 2. And exactly which ones are actually down keeps changing. What should I do to get my cluster back up? Just wait? Regards, Rodrigo Severo _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx