you can run 13.2.1 mds on another machine. kill all client sessions and wait until purge queue is empty. then it's safe to run 13.2.2 mds. run command "cephfs-journal-tool --rank=cephfs_name:rank --journal=purge_queue header get" purge queue is empty when write_pos == expire_pos On Wed, Nov 21, 2018 at 8:49 AM Chris Martin <span> wrote: > > I am also having this problem. Zheng (or anyone else), any idea how to > perform this downgrade on a node that is also a monitor and an OSD > node? > > dpkg complains of a dependency conflict when I try to install > ceph-mds_13.2.1-1xenial_amd64.deb: > > ``` > dpkg: dependency problems prevent configuration of ceph-mds: > ceph-mds depends on ceph-base (= 13.2.1-1xenial); however: > Version of ceph-base on system is 13.2.2-1xenial. > ``` > > I don't think I want to downgrade ceph-base to 13.2.1. > > Thank you, > Chris Martin > > > Sorry. this is caused wrong backport. downgrading mds to 13.2.1 and > > marking mds repaird can resolve this. > > > > Yan, Zheng > > On Sat, Oct 6, 2018 at 8:26 AM Sergey Malinin <span> wrote: > > > > > > Update: > > > I discovered http://tracker.ceph.com/issues/24236 and https://github.com/ceph/ceph/pull/22146 > > > Make sure that it is not relevant in your case before proceeding to operations that modify on-disk data. > > > > > > > > > On 6.10.2018, at 03:17, Sergey Malinin <span> wrote: > > > > > > I ended up rescanning the entire fs using alternate metadata pool approach as in http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/ > > > The process has not competed yet because during the recovery our cluster encountered another problem with OSDs that I got fixed yesterday (thanks to Igor Fedotov @ SUSE). > > > The first stage (scan_extents) completed in 84 hours (120M objects in data pool on 8 hdd OSDs on 4 hosts). The second (scan_inodes) was interrupted by OSDs failure so I have no timing stats but it seems to be runing 2-3 times faster than extents scan. > > > As to root cause -- in my case I recall that during upgrade I had forgotten to restart 3 OSDs, one of which was holding metadata pool contents, before restarting MDS daemons and that seemed to had an impact on MDS journal corruption, because when I restarted those OSDs, MDS was able to start up but soon failed throwing lots of 'loaded dup inode' errors. > > > > > > > > > On 6.10.2018, at 00:41, Alfredo Daniel Rezinovsky <span> wrote: > > > > > > Same problem... > > > > > > # cephfs-journal-tool --journal=purge_queue journal inspect > > > 2018-10-05 18:37:10.704 7f01f60a9bc0 -1 Missing object 500.0000016c > > > Overall journal integrity: DAMAGED > > > Objects missing: > > > 0x16c > > > Corrupt regions: > > > 0x5b000000-ffffffffffffffff > > > > > > Just after upgrade to 13.2.2 > > > > > > Did you fixed it? > > > > > > > > > On 26/09/18 13:05, Sergey Malinin wrote: > > > > > > Hello, > > > Followed standard upgrade procedure to upgrade from 13.2.1 to 13.2.2. > > > After upgrade MDS cluster is down, mds rank 0 and purge_queue journal are damaged. Resetting purge_queue does not seem to work well as journal still appears to be damaged. > > > Can anybody help? > > > > > > mds log: > > > > > > -789> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.mds2 Updating MDS map to version 586 from mon.2 > > > -788> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 handle_mds_map i am now mds.0.583 > > > -787> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 handle_mds_map state change up:rejoin --> up:active > > > -786> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 recovery_done -- successful recovery! > > > <span> > > > -38> 2018-09-26 18:42:32.707 7f70f28a7700 -1 mds.0.purge_queue _consume: Decode error at read_pos=0x322ec6636 > > > -37> 2018-09-26 18:42:32.707 7f70f28a7700 5 mds.beacon.mds2 set_want_state: up:active -> down:damaged > > > -36> 2018-09-26 18:42:32.707 7f70f28a7700 5 mds.beacon.mds2 _send down:damaged seq 137 > > > -35> 2018-09-26 18:42:32.707 7f70f28a7700 10 monclient: _send_mon_message to mon.ceph3 at mon:6789/0 > > > -34> 2018-09-26 18:42:32.707 7f70f28a7700 1 -- mds:6800/e4cc09cf --> mon:6789/0 -- mdsbeacon(14c72/mds2 down:damaged seq 137 v24a) v7 -- 0x563b321ad480 con 0 > > > <span> > > > -3> 2018-09-26 18:42:32.743 7f70f98b5700 5 -- mds:6800/3838577103 >> mon:6789/0 conn(0x563b3213e000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8 cs=1 l=1). rx mon.2 seq 29 0x563b321ab880 mdsbeaco > > > n(85106/mds2 down:damaged seq 311 v587) v7 > > > -2> 2018-09-26 18:42:32.743 7f70f98b5700 1 -- mds:6800/3838577103 <== mon.2 mon:6789/0 29 ==== mdsbeacon(85106/mds2 down:damaged seq 311 v587) v7 ==== 129+0+0 (3296573291 0 0) 0x563b321ab880 con 0x563b3213e > > > 000 > > > -1> 2018-09-26 18:42:32.743 7f70f98b5700 5 mds.beacon.mds2 handle_mds_beacon down:damaged seq 311 rtt 0.038261 > > > 0> 2018-09-26 18:42:32.743 7f70f28a7700 1 mds.mds2 respawn! > > > > > > # cephfs-journal-tool --journal=purge_queue journal inspect > > > Overall journal integrity: DAMAGED > > > Corrupt regions: > > > 0x322ec65d9-ffffffffffffffff > > > > > > # cephfs-journal-tool --journal=purge_queue journal reset > > > old journal was 13470819801~8463 > > > new journal start will be 13472104448 (1276184 bytes past old end) > > > writing journal head > > > done > > > > > > # cephfs-journal-tool --journal=purge_queue journal inspect > > > 2018-09-26 19:00:52.848 7f3f9fa50bc0 -1 Missing object 500.00000c8c > > > Overall journal integrity: DAMAGED > > > Objects missing: > > > 0xc8c > > > Corrupt regions: > > > 0x323000000-ffffffffffffffff > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users at lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users at lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com</span></span></span></span></span></span> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com