Data loss on appends, prod outage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



As of this morning, when two CephFS clients append to the same file in
quick succession, one append sometimes overwrites the other. This
happens on some clients but not others; we're still trying to track
down the pattern, if any.  We've failed all production filesystems to
prevent further data loss. We added 3 new OSD servers last week, they
finished backfilling a few days ago. Servers are Ubuntu 18.04, clients
mostly 18.04 and 20.04, with HWE kernels (5.4 and 5.11 respectively).
Ceph was upgraded from nautilus to octopus months ago. There were no
relevant errors or even warnings in "ceph health" before we stopped
the filesystems:

HEALTH_ERR mons are allowing insecure global_id reclaim; 20 OSD(s)
experiencing BlueFS spillover; 6 filesystems are degraded; 6
filesystems are offline

ceph versions
{
    "mon": {
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 3
    },
    "mgr": {
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 3
    },
    "osd": {
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 200
    },
    "mds": {
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 48
    },
    "rgw": {
        "ceph version 15.2.13
(c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    },
    "overall": {
        "ceph version 15.2.13
(c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1,
        "ceph version 15.2.14
(cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 254
    }
}

I looked for bugs on the tracker but didn't see anything that seemed
like our issue. Any advice would be appreciated.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux