On Wed, Mar 25, 2020 at 5:19 AM Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote: > > Dear All, > > We are "in a bit of a pickle"... > > No reply to my message (23/03/2020), subject "OSD: FAILED > ceph_assert(clone_size.count(clone))" > > So I'm presuming it's not possible to recover the crashed OSD >From your later email it sounds like this is part of a chain of events that included telling the OSDs to deliberately work around some known-old-or-wrong state, so while I wouldn't say it's impossible to fix it's probably not simple and may require buying some consulting from one of the teams that do that. I don't think I've seen these errors before anywhere, certainly. > > This is bad news, as one pg may be lost, (we are using EC 8+2, pg dump > shows [NONE,NONE,NONE,388,125,25,427,226,77,154] ) > > Without this pg we have 1.8PB of broken cephfs. > > I could rebuild the cluster from scratch, but this means no user backups > for a couple of weeks. > > The cluster has 10 nodes, uses an EC 8:2 pool for cephfs data > (replicated NVMe metdata pool) and is running Nautilus 14.2.8 > > Clearly, it would be nicer if we could fix the OSD, but if this isn't > possible, can someone confirm that the right procedure to recover from a > corrupt pg is: > > 1) Stop all client access > 2) find all files that store data on the bad pg, with: > # cephfs-data-scan pg_files /backup 5.750 2> /dev/null > /root/bad_files > 3) delete all of these bad files - presumably using truncate? or is "rm" > fine? > 4) destroy the bad pg > # ceph osd force-create-pg 5.750 > 5) Copy the missing files back with rsync or similar... This sounds about right. Keep in mind that the PG will just be a random fraction of the default-4MB objects in CephFS, so you will hit a huge proportion of large files. As I assume this is a data pool, keep in mind that any large files will just show up as a 4MB hole where the missing object is — this may be preferable to reverting the whole file, or let you only copy in the missing data if they are logs, or whatever. And you can use the CephFS metadata to determine if the file in backup is up-to-date or not. -Greg > > a better "recipe" or other advice gratefully received, > > best regards, > Jake > > > **** > > Note: I am working from home until further notice. > > For help, contact unixadmin@xxxxxxxxxxxxxxxxx > -- > Dr Jake Grimmett > Head Of Scientific Computing > MRC Laboratory of Molecular Biology > Francis Crick Avenue, > Cambridge CB2 0QH, UK. > Phone 01223 267019 > Mobile 0776 9886539 > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx