Re: Upgrade from 12.2.1 to 12.2.2 broke my CephFs

"Yan, Zheng" <ukernel@xxxxxxxxx> · Mon, 11 Dec 2017 23:15:56 +0800

On Mon, Dec 11, 2017 at 10:13 PM, Tobias Prousa <tobias.prousa@xxxxxxxxx> wrote:
> Hi there,
>
> I'm running a CEPH cluster for some libvirt VMs and a CephFS providing /home
> to ~20 desktop machines. There are 4 Hosts running 4 MONs, 4MGRs, 3MDSs (1
> active, 2 standby) and 28 OSDs in total. This cluster is up and running
> since the days of Bobtail (yes, including CephFS).
>
> Now with update from 12.2.1 to 12.2.2 on last friday afternoon I restarted
> MONs, MGRs, OSDs as usual. RBD is running just fine. But after trying to
> restart MDSs they tried replaying journal then fell back to standby and FS
> was in state "damaged". I finally got them back working after I did a good
> portion of whats described here:
>
> http://docs.ceph.com/docs/master/cephfs/disaster-recovery/
>
> Now when all clients are shut down I can start MDS, will replay and become
> active. I then can mount CephFS on a client and can access my files and
> folders. But the more clients I bring up MDS will first report damaged
> metadata (probably due to some damaged paths, I could live with that) and
> then MDS will fail with assert:
>
> /build/ceph-12.2.2/src/mds/MDCache.cc: 258: FAILED
> assert(inode_map.count(in->vino()) == 0)
>
> I tried doing an online CephFS scrub like
>
> ceph daemon mds.a scrub_path / recursive repair
>
> This will run for couple of hours, always finding exactly 10001 damages of
> type "backtrace" and reporting it would be fixing loads of erronously
> free-marked inodes until MDS crashes. When I rerun that scrub after having
> killed all clients and restarted MDSs things will repeat finding exactly
> those 10001 damages and it will begin fixing those exactly same free-marked
> inodes over again.
>
> Btw. CephFS has about 3 million objects in metadata pool. Data pool is about
> 30 million objects with ~2.5TB * 3 replicas.
>
> What I tried next is keeping MDS down and doing
>
> cephfs-data-scan scan_extents <data pool>
> cephfs-data-scan scan_inodes <data pool>
> cephfs-data-scan scan_links
>
> As this is described to take "a very long time" this is what I initially
> skipped from disater-recovery tips. Right now I'm still on first step with 6
> workers on a single host busy doing cephfs-data-scan scan_extents. ceph -s
> shows me client io of 20kB/s (!!!). If thats real scan speed this is going
> to take ages.
> Any way to tell how long this is going to take? Could I speed things up by
> running more workers on multiple hosts simultaneously?
> Should I abort it as I actually don't have the problem of lost files. Maybe
> running cephfs-data-scan scan_links would better suit my issue, or does
> scan_extents/scan_indoes HAVE to be run and finished first?
>

you can interrupt scan_extents safely.

> I have to get this cluster up and running again as soon as possible. Any
> help highly appreciated. If there is anything I can help, e.g. with further
> information, feel free to ask. I'll try to hang around on #ceph (nick
> topro/topro_/topro__). FYI, I'm in Central Europe TimeZone (UTC+1).
>
> Thank you so much!
>
> Best regards,
> Tobi
>
> --
> -----------------------------------------------------------
> Dipl.-Inf. (FH) Tobias Prousa
> Leiter Entwicklung Datenlogger
>
> CAETEC GmbH
> Industriestr. 1
> D-82140 Olching
> www.caetec.de
>
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Olching
> Handelsregister: Amtsgericht München, HRB 183929
> Geschäftsführung: Stephan Bacher, Andreas Wocke
>
> Tel.: +49 (0)8142 / 50 13 60
> Fax.: +49 (0)8142 / 50 13 69
>
> eMail: tobias.prousa@xxxxxxxxx
> Web:   http://www.caetec.de
> ------------------------------------------------------------
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com