On Mon, Dec 11, 2017 at 10:13 PM, Tobias Prousa <tobias.prousa@xxxxxxxxx> wrote: > Hi there, > > I'm running a CEPH cluster for some libvirt VMs and a CephFS providing /home > to ~20 desktop machines. There are 4 Hosts running 4 MONs, 4MGRs, 3MDSs (1 > active, 2 standby) and 28 OSDs in total. This cluster is up and running > since the days of Bobtail (yes, including CephFS). > > Now with update from 12.2.1 to 12.2.2 on last friday afternoon I restarted > MONs, MGRs, OSDs as usual. RBD is running just fine. But after trying to > restart MDSs they tried replaying journal then fell back to standby and FS > was in state "damaged". I finally got them back working after I did a good > portion of whats described here: > > http://docs.ceph.com/docs/master/cephfs/disaster-recovery/ What commands did you run? you need to run following commands. cephfs-journal-tool event recover_dentries summary cephfs-journal-tool journal reset cephfs-table-tool all reset session > > Now when all clients are shut down I can start MDS, will replay and become > active. I then can mount CephFS on a client and can access my files and > folders. But the more clients I bring up MDS will first report damaged > metadata (probably due to some damaged paths, I could live with that) and > then MDS will fail with assert: > > /build/ceph-12.2.2/src/mds/MDCache.cc: 258: FAILED > assert(inode_map.count(in->vino()) == 0) > > I tried doing an online CephFS scrub like > > ceph daemon mds.a scrub_path / recursive repair > > This will run for couple of hours, always finding exactly 10001 damages of > type "backtrace" and reporting it would be fixing loads of erronously > free-marked inodes until MDS crashes. When I rerun that scrub after having > killed all clients and restarted MDSs things will repeat finding exactly > those 10001 damages and it will begin fixing those exactly same free-marked > inodes over again. Find max inode number of these free-marked inodes, then use cephfs-table-tool to remove inode numbers that are smaller than the max number. you can remove a little more just in case. Before doing this, you should to stop mds and run "cephfs-table-tool all reset session". If everything goes right, mds will no longer trigger the assertion. > > Btw. CephFS has about 3 million objects in metadata pool. Data pool is about > 30 million objects with ~2.5TB * 3 replicas. > > What I tried next is keeping MDS down and doing > > cephfs-data-scan scan_extents <data pool> > cephfs-data-scan scan_inodes <data pool> > cephfs-data-scan scan_links > > As this is described to take "a very long time" this is what I initially > skipped from disater-recovery tips. Right now I'm still on first step with 6 > workers on a single host busy doing cephfs-data-scan scan_extents. ceph -s > shows me client io of 20kB/s (!!!). If thats real scan speed this is going > to take ages. > Any way to tell how long this is going to take? Could I speed things up by > running more workers on multiple hosts simultaneously? > Should I abort it as I actually don't have the problem of lost files. Maybe > running cephfs-data-scan scan_links would better suit my issue, or does > scan_extents/scan_indoes HAVE to be run and finished first? > > I have to get this cluster up and running again as soon as possible. Any > help highly appreciated. If there is anything I can help, e.g. with further > information, feel free to ask. I'll try to hang around on #ceph (nick > topro/topro_/topro__). FYI, I'm in Central Europe TimeZone (UTC+1). > > Thank you so much! > > Best regards, > Tobi > > -- > ----------------------------------------------------------- > Dipl.-Inf. (FH) Tobias Prousa > Leiter Entwicklung Datenlogger > > CAETEC GmbH > Industriestr. 1 > D-82140 Olching > www.caetec.de > > Gesellschaft mit beschränkter Haftung > Sitz der Gesellschaft: Olching > Handelsregister: Amtsgericht München, HRB 183929 > Geschäftsführung: Stephan Bacher, Andreas Wocke > > Tel.: +49 (0)8142 / 50 13 60 > Fax.: +49 (0)8142 / 50 13 69 > > eMail: tobias.prousa@xxxxxxxxx > Web: http://www.caetec.de > ------------------------------------------------------------ > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com