Re: inconsistent PG -> unfound objects on an erasure coded system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Sam.....

Since I have prepared a script for this, I decided to go ahead with the checks.....(patience isn't one of my extended attributes....) 

I've got a file that searches the full erasure encoded spaces and does your checklist below.   I have operated only on one PG so far, the 70.459 one that we've been discussing.    There was only the one file that I found to be out of place--the one we already discussed/found and it has been removed.   

The pg is still marked as inconsistent.   I've scrubbed it a couple of times now and what I've seen is: 

2016-03-17 09:29:53.202818 7f2e816f8700  0 log_channel(cluster) log [INF] : 70.459 deep-scrub starts
2016-03-17 09:36:38.436821 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones, 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 68440088914/68445454633 bytes,0/0 hit_set_archive bytes.
2016-03-17 09:36:38.436844 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459 deep-scrub 1 errors
2016-03-17 09:44:23.592302 7f2e816f8700  0 log_channel(cluster) log [INF] : 70.459 deep-scrub starts
2016-03-17 09:47:01.237846 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones, 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 68440088914/68445454633 bytes,0/0 hit_set_archive bytes.
2016-03-17 09:47:01.237880 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459 deep-scrub 1 errors


Should the scrub be sufficient to remove the inconsistent flag?   I took the osd offline during the repairs.    I've looked at files in all of the osds in the placement group and I'm not finding any more problem files.    The vast majority of files do not have the user.cephos.lfn3 attribute.    There are 22321 objects that I seen and only about 230 have the user.cephos.lfn3 file attribute.   The files will have other attributes, just not user.cephos.lfn3.

Regards, 
Jeff


On Wed, Mar 16, 2016 at 3:53 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
Ok, like I said, most files with _long at the end are *not orphaned*.
The generation number also is *not* an indication of whether the file
is orphaned -- some of the orphaned files will have ffffffffffffffff
as the generation number and others won't.  For each long filename
object in a pg you would have to:
1) Pull the long name out of the attr
2) Parse the hash out of the long name
3) Turn that into a directory path
4) Determine whether the file is at the right place in the path
5) If not, remove it (or echo it to be checked)

You probably want to wait for someone to get around to writing a
branch for ceph-objectstore-tool.  Should happen in the next week or
two.
-Sam


--
Jeffrey McDonald, PhD
Assistant Director for HPC Operations
Minnesota Supercomputing Institute
University of Minnesota Twin Cities
599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
117 Pleasant St SE           phone: +1 612 625-6905
Minneapolis, MN 55455        fax:   +1 612 624-8861

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux