Thanks for sharing, hope I never need the info, but glad to know it’s here and doable! On Tue, Jun 28, 2022 at 10:36 AM Florian Jonas <florian.jonas@xxxxxxx> wrote: > Dear all, > > just when we received Eugens message, we managed (with additional help > via zoom from other experts) to recover our filesystem. Thank you again > for your help. I briefly document our solution here. The monitors were > corrupted due to repeated destruction and recreation, destroying the > store.db of the monitors. The OSDs were intact. We followed the solution > here to recover the monitors from the store.db collected form the OSDs: > > > https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds > > However, we had made one mistake during one of the steps. For anyone > reading this: make sure that the OSD services are not running before > running the procedure. We then stopped all ceph services and replaced > the corrupted store.db for each node: > > mv $extractedstoredb/store.db /var/lib/ceph/mon/mon.foo/store.db > > chown -R ceph:ceph /var/lib/ceph/mon/mon.foo/store.db > > we then started the monitors one by one and then started the osd > services again. At this stage we got the pools again. We then roughly > followed the guide here: > > https://docs.ceph.com/en/quincy/cephfs/recover-fs-after-mon-store-loss/ > > to restore the filesystem, while making sure that NO MDS is running. > However, I think the exact commands depend on the ceph version, so I > would double check with an expert for the last step, since as far as i > understood it can lead to erasure of files if the --recover flag is not > properly implemented. > > Best regards, > > Florian > > > > On 28/06/2022 15:12, Eugen Block wrote: > > I agree, having one MON out of quorum should not result in hanging > > ceph commands, maybe a little delay until all clients have noticed it. > > So the first question is, what happened there? Did you notice anything > > else that could disturb the cluster? Do you have the logs from the > > remaining two MONs and do they reveal anything? But this is just > > relevant for the analysis and maybe prevent something similar from > > happening in the future. Have you tried restarting the MGR after the > > OSDs came back up? If not, I would restart it (do you have a second > > MGR to be able to failover?) and then also restart a single OSD to see > > if anything changes in the cluster status. You're right about the MDS, > > of course. First you need the cephfs pools to be available again > > before the MDS can start its work. > > > > Zitat von Florian Jonas <florian.jonas@xxxxxxx>: > > > >> Hi, > >> > >> thanks a lot for getting back to me. I will try to clarify what > >> happened and reconstruct the timeline. For context, our computing > >> cluster is part of a bigger network infrastructure that is managed by > >> someone else, and for the particular node running the MON and MDS we > >> had not assigned a static IP address due to an oversight on our part. > >> The cluster is run semi-professionally by me and a colleague and > >> started as a small test but quickly grew in scale, so we are still > >> somewhat beginners. The machine got stuck due to some unrelated issue > >> and we had to reboot, and after reboot only this one address changed > >> (last three digits). > >> > >> After the reboot, the ceph status command was no longer working, > >> which caused a bit of a panic. In principle, it should have still > >> worked since the other two machines still should have had quorum. We > >> quickly realized the IP address change and destroyed the monitor in > >> question and re-created it after we had changed the mon ip in the > >> ceph config. However, I think this was a mistake since in general the > >> system was not in a good state (I assume due to the crashed MDS). In > >> the rush to get things back online (second mistake), the other two > >> monitors were also destroyed and re-created, even though their IP > >> address did not change. At this point the ceph status command was > >> still not available and just hanging. > >> > >> We proceeded following the procedure outline here: > >> > >> > https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds > >> > >> > >> in order to restore the monitors using the OSDs on each node. After > >> following this procedure we managed to get all three monitors back > >> online and they now all show a quorum. This is the current situation. > >> I think this whole mess is a mix of unlucky circumstances and > >> panicked incompetence on our part ... > >> > >> By restarting the MDS, do you mean restarting the MDS service on the > >> node in question? All three of them currently show up as "inactive", > >> I think because no filesystem is recognized and they see no reason to > >> become active. Regarding your question why the backup MDS did not > >> start, I do not know. It is indeed strange! > >> > >> Best regards, > >> > >> Florian Jonas > >> > >> > >> On 28/06/2022 13:29, Eugen Block wrote: > >>> Hi, > >>> > >>> just to clarify, only one of the MONs had a different IP address > >>> (how and why, DHCP?), but you got it up again (since your cluster > >>> shows quorum). So the subnet didn't change, only the one address? > >>> Did you already try to restart the MDS? And what about the standby > >>> MDS, it could have taken over, couldn't it? The "0 in" OSDs could be > >>> a MGR issue, I'm not sure how that worked in Mimic. But they appear > >>> to be working, so it's not really clear yet what the actual problem > >>> is, but data loss is unlikely since the OSDs have not been wiped and > >>> they also load their PGs, it appears: > >>> > >>>> 2022-06-24 09:16:44.527 7fdc165d5c00 0 osd.6 13035 load_pgs > >>>> 2022-06-24 09:16:50.375 7fdc165d5c00 0 osd.6 13035 load_pgs opened > >>>> 67 pgs > >>> > >>> > >>> Zitat von Florian Jonas <florian.jonas@xxxxxxx>: > >>> > >>>> Dear experts, > >>>> > >>>> we have a small computing cluster with 21 OSDs and 3 monitors and > >>>> 3MDS running on ceph version 13.2.10 on ubuntu 18.04. A few days > >>>> ago we had an unexpected reboot of all machines, as well as a > >>>> change of the IP address of one machine, which was hosting a MDS as > >>>> well as a monitor. I am not exactly sure what played out during > >>>> that night, but we lost quorum of all three monitors and no > >>>> filesystem was visible anymore, so we are starting to get quite > >>>> worried about data loss. We tried destroying and recreating the > >>>> monitor of which the ip address changed, but it did not help (which > >>>> however might have been a mistake). > >>>> > >>>> Long story short, we tried to recover restoring by adapting the > >>>> changed ip address in the config and tried to recover the monitors > >>>> using the information from the OSDs, following the procedure > >>>> outline here: > >>>> > >>>> > https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds > >>>> We are now in a situation where ceph status shows the following: > >>>> > >>>> cluster: > >>>> id: 61fd9a61-89d6-4383-a2e6-ec4f4a13830f > >>>> health: HEALTH_WARN > >>>> 43 slow ops, oldest one blocked for 57132 sec, daemons > >>>> [mon.dip01,mon.pc078,mon.pc147] have slow ops. > >>>> > >>>> services: > >>>> mon: 3 daemons, quorum pc147,pc078,dip01 > >>>> mgr: dip01(active) > >>>> osd: 22 osds: 0 up, 0 in > >>>> > >>>> data: > >>>> pools: 0 pools, 0 pgs > >>>> objects: 0 objects, 0 B > >>>> usage: 0 B used, 0 B / 0 B avail > >>>> pgs: > >>>> > >>>> The monitors show a quorum (i think that's a good start), but we do > >>>> not see any of the pools that were previously there and also no > >>>> filesystem is visible. Running the command "ceph fs status" shows > >>>> all MDS are in standby and no filesystem is activated. > >>>> > >>>> I looked into the HEALTH_WARNING, by checking the journalctl -xe on > >>>> the monitor machines and one finds errors of the type: > >>>> > >>>> Jun 24 09:10:30 dip01 ceph-mon[69148]: 2022-06-24 09:10:30.978 > >>>> 7f0173e02700 -1 mon.dip01@2(peon) e15 get_health_metrics reporting > >>>> 4 slow ops, oldest is osd_boot(osd.12 booted 0 features > >>>> 4611087854031667195 v13031) > >>>> > >>>> In order to check what is going on with the osd_boot error, i > >>>> checked the logs on the osd machines and found warning such as: > >>>> > >>>> 2022-06-24 09:16:42.383 7fdc165d5c00 0 <cls> > >>>> /build/ceph-13.2.10/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs > >>>> 2022-06-24 09:16:42.383 7fdc165d5c00 0 _get_class not permitted to > >>>> load kvs > >>>> 2022-06-24 09:16:42.383 7fdc165d5c00 0 <cls> > >>>> /build/ceph-13.2.10/src/cls/hello/cls_hello.cc:296: loading cls_hello > >>>> 2022-06-24 09:16:42.383 7fdc165d5c00 0 _get_class not permitted to > >>>> load lua > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 0 _get_class not permitted to > >>>> load sdk > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 1 osd.6 13035 warning: got an > >>>> error loading one or more classes: (1) Operation not permitted > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 0 osd.6 13035 crush map has > >>>> features 288514051259236352, adjusting msgr requires for clients > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 0 osd.6 13035 crush map has > >>>> features 288514051259236352 was 8705, adjusting msgr requires for mons > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 0 osd.6 13035 crush map has > >>>> features 1009089991638532096, adjusting msgr requires for osds > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 1 osd.6 13035 > >>>> check_osdmap_features require_osd_release 0 -> > >>>> 2022-06-24 09:16:44.527 7fdc165d5c00 0 osd.6 13035 load_pgs > >>>> 2022-06-24 09:16:50.375 7fdc165d5c00 0 osd.6 13035 load_pgs opened > >>>> 67 pgs > >>>> 2022-06-24 09:16:50.375 7fdc165d5c00 0 osd.6 13035 using > >>>> weightedpriority op queue with priority op cut off at 64. > >>>> 2022-06-24 09:16:50.375 7fdc165d5c00 -1 osd.6 13035 log_to_monitors > >>>> {default=true} > >>>> 2022-06-24 09:16:50.383 7fdc165d5c00 0 osd.6 13035 done with init, > >>>> starting boot process > >>>> 2022-06-24 09:16:50.383 7fdc165d5c00 1 osd.6 13035 start_boot > >>>> 2022-06-24 09:16:50.495 7fdbec933700 1 osd.6 pg_epoch: 13035 > >>>> pg[5.1( v 2785'2 (0'0,2785'2] local-lis/les=12997/12999 n=1 > >>>> ec=2782/2782 lis/c 12997/12997 les/c/f 12999/12999/0 > >>>> 12997/12997/12954) [6,17,14] r=0 lpr=13021 crt=2785'2 lcod 0'0 > >>>> mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary > >>>> > >>>> The 21 OSDs themselves show as "exists,new" in ceph osd status, > >>>> even though they remained untouched during the whole incident > >>>> (which I hope means they still contain all our data somewhere) > >>>> > >>>> We only started operating our distributed filesystem about one year > >>>> ago, and I must admit with this problem we are a bit out of our > >>>> depth, so we would very much would appreciate any leads/help we can > >>>> get on getting our filesystem up and running again. Alternatively, > >>>> if all else fails, we would also appreciate any information about > >>>> the possibility of recovering the data from the 21 OSDs, which > >>>> amounts to over 60TB. > >>>> > >>>> Attached you find our ceph.conf file, as well as the logs from one > >>>> example monitor and one osd node. If you need any other information > >>>> let us know. > >>>> > >>>> Thank you in advance for you help, I know your time is valuable! > >>>> > >>>> Best regards, > >>>> > >>>> Florian Jonas > >>>> > >>>> p.s. to the moderators: This message is a resubmit with smaller log > >>>> files. I was not aware of the 1MB limit. The previously bounced > >>>> message can be ignored! > >>> > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > >> On 28/06/2022 13:29, Eugen Block wrote: > >>> Hi, > >>> > >>> just to clarify, only one of the MONs had a different IP address > >>> (how and why, DHCP?), but you got it up again (since your cluster > >>> shows quorum). So the subnet didn't change, only the one address? > >>> Did you already try to restart the MDS? And what about the standby > >>> MDS, it could have taken over, couldn't it? The "0 in" OSDs could be > >>> a MGR issue, I'm not sure how that worked in Mimic. But they appear > >>> to be working, so it's not really clear yet what the actual problem > >>> is, but data loss is unlikely since the OSDs have not been wiped and > >>> they also load their PGs, it appears: > >>> > >>>> 2022-06-24 09:16:44.527 7fdc165d5c00 0 osd.6 13035 load_pgs > >>>> 2022-06-24 09:16:50.375 7fdc165d5c00 0 osd.6 13035 load_pgs opened > >>>> 67 pgs > >>> > >>> > >>> Zitat von Florian Jonas <florian.jonas@xxxxxxx>: > >>> > >>>> Dear experts, > >>>> > >>>> we have a small computing cluster with 21 OSDs and 3 monitors and > >>>> 3MDS running on ceph version 13.2.10 on ubuntu 18.04. A few days > >>>> ago we had an unexpected reboot of all machines, as well as a > >>>> change of the IP address of one machine, which was hosting a MDS as > >>>> well as a monitor. I am not exactly sure what played out during > >>>> that night, but we lost quorum of all three monitors and no > >>>> filesystem was visible anymore, so we are starting to get quite > >>>> worried about data loss. We tried destroying and recreating the > >>>> monitor of which the ip address changed, but it did not help (which > >>>> however might have been a mistake). > >>>> > >>>> Long story short, we tried to recover restoring by adapting the > >>>> changed ip address in the config and tried to recover the monitors > >>>> using the information from the OSDs, following the procedure > >>>> outline here: > >>>> > >>>> > https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds > >>>> We are now in a situation where ceph status shows the following: > >>>> > >>>> cluster: > >>>> id: 61fd9a61-89d6-4383-a2e6-ec4f4a13830f > >>>> health: HEALTH_WARN > >>>> 43 slow ops, oldest one blocked for 57132 sec, daemons > >>>> [mon.dip01,mon.pc078,mon.pc147] have slow ops. > >>>> > >>>> services: > >>>> mon: 3 daemons, quorum pc147,pc078,dip01 > >>>> mgr: dip01(active) > >>>> osd: 22 osds: 0 up, 0 in > >>>> > >>>> data: > >>>> pools: 0 pools, 0 pgs > >>>> objects: 0 objects, 0 B > >>>> usage: 0 B used, 0 B / 0 B avail > >>>> pgs: > >>>> > >>>> The monitors show a quorum (i think that's a good start), but we do > >>>> not see any of the pools that were previously there and also no > >>>> filesystem is visible. Running the command "ceph fs status" shows > >>>> all MDS are in standby and no filesystem is activated. > >>>> > >>>> I looked into the HEALTH_WARNING, by checking the journalctl -xe on > >>>> the monitor machines and one finds errors of the type: > >>>> > >>>> Jun 24 09:10:30 dip01 ceph-mon[69148]: 2022-06-24 09:10:30.978 > >>>> 7f0173e02700 -1 mon.dip01@2(peon) e15 get_health_metrics reporting > >>>> 4 slow ops, oldest is osd_boot(osd.12 booted 0 features > >>>> 4611087854031667195 v13031) > >>>> > >>>> In order to check what is going on with the osd_boot error, i > >>>> checked the logs on the osd machines and found warning such as: > >>>> > >>>> 2022-06-24 09:16:42.383 7fdc165d5c00 0 <cls> > >>>> /build/ceph-13.2.10/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs > >>>> 2022-06-24 09:16:42.383 7fdc165d5c00 0 _get_class not permitted to > >>>> load kvs > >>>> 2022-06-24 09:16:42.383 7fdc165d5c00 0 <cls> > >>>> /build/ceph-13.2.10/src/cls/hello/cls_hello.cc:296: loading cls_hello > >>>> 2022-06-24 09:16:42.383 7fdc165d5c00 0 _get_class not permitted to > >>>> load lua > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 0 _get_class not permitted to > >>>> load sdk > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 1 osd.6 13035 warning: got an > >>>> error loading one or more classes: (1) Operation not permitted > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 0 osd.6 13035 crush map has > >>>> features 288514051259236352, adjusting msgr requires for clients > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 0 osd.6 13035 crush map has > >>>> features 288514051259236352 was 8705, adjusting msgr requires for mons > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 0 osd.6 13035 crush map has > >>>> features 1009089991638532096, adjusting msgr requires for osds > >>>> 2022-06-24 09:16:42.387 7fdc165d5c00 1 osd.6 13035 > >>>> check_osdmap_features require_osd_release 0 -> > >>>> 2022-06-24 09:16:44.527 7fdc165d5c00 0 osd.6 13035 load_pgs > >>>> 2022-06-24 09:16:50.375 7fdc165d5c00 0 osd.6 13035 load_pgs opened > >>>> 67 pgs > >>>> 2022-06-24 09:16:50.375 7fdc165d5c00 0 osd.6 13035 using > >>>> weightedpriority op queue with priority op cut off at 64. > >>>> 2022-06-24 09:16:50.375 7fdc165d5c00 -1 osd.6 13035 log_to_monitors > >>>> {default=true} > >>>> 2022-06-24 09:16:50.383 7fdc165d5c00 0 osd.6 13035 done with init, > >>>> starting boot process > >>>> 2022-06-24 09:16:50.383 7fdc165d5c00 1 osd.6 13035 start_boot > >>>> 2022-06-24 09:16:50.495 7fdbec933700 1 osd.6 pg_epoch: 13035 > >>>> pg[5.1( v 2785'2 (0'0,2785'2] local-lis/les=12997/12999 n=1 > >>>> ec=2782/2782 lis/c 12997/12997 les/c/f 12999/12999/0 > >>>> 12997/12997/12954) [6,17,14] r=0 lpr=13021 crt=2785'2 lcod 0'0 > >>>> mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary > >>>> > >>>> The 21 OSDs themselves show as "exists,new" in ceph osd status, > >>>> even though they remained untouched during the whole incident > >>>> (which I hope means they still contain all our data somewhere) > >>>> > >>>> We only started operating our distributed filesystem about one year > >>>> ago, and I must admit with this problem we are a bit out of our > >>>> depth, so we would very much would appreciate any leads/help we can > >>>> get on getting our filesystem up and running again. Alternatively, > >>>> if all else fails, we would also appreciate any information about > >>>> the possibility of recovering the data from the 21 OSDs, which > >>>> amounts to over 60TB. > >>>> > >>>> Attached you find our ceph.conf file, as well as the logs from one > >>>> example monitor and one osd node. If you need any other information > >>>> let us know. > >>>> > >>>> Thank you in advance for you help, I know your time is valuable! > >>>> > >>>> Best regards, > >>>> > >>>> Florian Jonas > >>>> > >>>> p.s. to the moderators: This message is a resubmit with smaller log > >>>> files. I was not aware of the 1MB limit. The previously bounced > >>>> message can be ignored! > >>> > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx