Upgrade from 12.2.1 to 12.2.2 broke my CephFs

Tobias Prousa <tobias.prousa@xxxxxxxxx> · Mon, 11 Dec 2017 15:13:00 +0100

    Hi there,

    I'm running a CEPH cluster for some libvirt VMs and a CephFS
    providing /home to ~20 desktop machines. There are 4 Hosts running 4
    MONs, 4MGRs, 3MDSs (1 active, 2 standby) and 28 OSDs in total. This
    cluster is up and running since the days of Bobtail (yes, including
    CephFS).

    Now with update from 12.2.1 to 12.2.2 on last friday afternoon I
    restarted MONs, MGRs, OSDs as usual. RBD is running just fine. But
    after trying to restart MDSs they tried replaying journal then fell
    back to standby and FS was in state "damaged". I finally got them
    back working after I did a good portion of whats described here:

    http://docs.ceph.com/docs/master/cephfs/disaster-recovery/

    Now when all clients are shut down I can start MDS, will replay and
    become active. I then can mount CephFS on a client and can access my
    files and folders. But the more clients I bring up MDS will first
    report damaged metadata (probably due to some damaged paths, I could
    live with that) and then MDS will fail with assert:

    /build/ceph-12.2.2/src/mds/MDCache.cc: 258: FAILED
    assert(inode_map.count(in->vino()) == 0)

    I tried doing an online CephFS scrub like 

    ceph daemon mds.a
    scrub_path / recursive repair

    This will run for couple of hours, always finding exactly 10001
    damages of type "backtrace" and reporting it would be fixing loads
    of erronously free-marked inodes until MDS crashes. When I rerun
    that scrub after having killed all clients and restarted MDSs things
    will repeat finding exactly those 10001 damages and it will begin
    fixing those exactly same free-marked inodes over again.

    Btw. CephFS has about 3 million objects in metadata pool. Data pool
    is about 30 million objects with ~2.5TB * 3 replicas.

    What I tried next is keeping MDS down and doing 

    cephfs-data-scan scan_extents
    <data pool>

    cephfs-data-scan scan_inodes
    <data pool>

    cephfs-data-scan scan_links

      As this is described to take "a very long time" this is what I
      initially skipped from disater-recovery tips. Right now I'm still
      on first step with 6 workers on a single host busy doing cephfs-data-scan
      scan_extents. ceph -s shows me client io of 20kB/s
        (!!!). If thats real scan speed this is going to take ages.

        Any way to tell how long this is going to take? Could I speed
        things up by running more workers on multiple hosts
        simultaneously?

        Should I abort it as I actually don't have the problem of lost
        files. Maybe running cephfs-data-scan scan_links would better
      suit my issue, or does scan_extents/scan_indoes HAVE to be run and
      finished first?

        I have to get this cluster up and running again as soon as
        possible. Any help highly appreciated. If there is anything I
        can help, e.g. with further information, feel free to ask. I'll
        try to hang around on #ceph (nick topro/topro_/topro__). FYI,
        I'm in Central Europe TimeZone (UTC+1).

        Thank you so much!

        Best regards,

        Tobi

    -- 
-----------------------------------------------------------
Dipl.-Inf. (FH) Tobias Prousa
Leiter Entwicklung Datenlogger

CAETEC GmbH
Industriestr. 1
D-82140 Olching
www.caetec.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Olching
Handelsregister: Amtsgericht München, HRB 183929
Geschäftsführung: Stephan Bacher, Andreas Wocke

Tel.: +49 (0)8142 / 50 13 60
Fax.: +49 (0)8142 / 50 13 69

eMail: tobias.prousa@xxxxxxxxx
Web:   http://www.caetec.de
------------------------------------------------------------

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com