Thanks Sam.....
Since I have prepared a script for this, I decided to go ahead with the checks.....(patience isn't one of my extended attributes....)
I've got a file that searches the full erasure encoded spaces and does your checklist below. I have operated only on one PG so far, the 70.459 one that we've been discussing. There was only the one file that I found to be out of place--the one we already discussed/found and it has been removed.
The pg is still marked as inconsistent. I've scrubbed it a couple of times now and what I've seen is:
2016-03-17 09:29:53.202818 7f2e816f8700 0 log_channel(cluster) log [INF] : 70.459 deep-scrub starts
2016-03-17 09:36:38.436821 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones, 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 68440088914/68445454633 bytes,0/0 hit_set_archive bytes.
2016-03-17 09:36:38.436844 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459 deep-scrub 1 errors
2016-03-17 09:44:23.592302 7f2e816f8700 0 log_channel(cluster) log [INF] : 70.459 deep-scrub starts
2016-03-17 09:47:01.237846 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones, 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 68440088914/68445454633 bytes,0/0 hit_set_archive bytes.
2016-03-17 09:47:01.237880 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459 deep-scrub 1 errors
Should the scrub be sufficient to remove the inconsistent flag? I took the osd offline during the repairs. I've looked at files in all of the osds in the placement group and I'm not finding any more problem files. The vast majority of files do not have the user.cephos.lfn3 attribute. There are 22321 objects that I seen and only about 230 have the user.cephos.lfn3 file attribute. The files will have other attributes, just not user.cephos.lfn3.
Regards,
Jeff
On Wed, Mar 16, 2016 at 3:53 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
Ok, like I said, most files with _long at the end are *not orphaned*.
The generation number also is *not* an indication of whether the file
is orphaned -- some of the orphaned files will have ffffffffffffffff
as the generation number and others won't. For each long filename
object in a pg you would have to:
1) Pull the long name out of the attr
2) Parse the hash out of the long name
3) Turn that into a directory path
4) Determine whether the file is at the right place in the path
5) If not, remove it (or echo it to be checked)
You probably want to wait for someone to get around to writing a
branch for ceph-objectstore-tool. Should happen in the next week or
two.
-Sam
Jeffrey McDonald, PhD Assistant Director for HPC Operations Minnesota Supercomputing Institute University of Minnesota Twin Cities 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx 117 Pleasant St SE phone: +1 612 625-6905 Minneapolis, MN 55455 fax: +1 612 624-8861
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com