Re: CephFS object mapping.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 22, 2019 at 12:22 AM Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi,

On 5/21/19 9:46 PM, Robert LeBlanc wrote:
> I'm at a new job working with Ceph again and am excited to back in the
> community!
>
> I can't find any documentation to support this, so please help me
> understand if I got this right.
>
> I've got a Jewel cluster with CephFS and we have an inconsistent PG.
> All copies of the object are zero size, but the digest says that it
> should be a non-zero size, so it seems that my two options are, delete
> the file that the object is part of, or rewrite the object with RADOS
> to update the digest. So, this leads to my question, how to I tell
> which file the object belongs to.
>
> From what I found, the object is prefixed with the hex value of the
> inode and suffixed by the stripe number:
> 1000d2ba15c.00000005
> <inode hex>.<hex stripe number>
>
> I then ran `find . -xdev -inum 1099732590940` and found a file on the
> CephFS file system. I just want to make sure that I found the right
> file before I start trying recovery options.
>

The first stripe XYZ.00000000 has some metadata stored as xattr (rados
xattr, not cephfs xattr). One of the entries has the key 'parent':

When you say 'some' is it a fixed offset that the file data starts? Is the first stripe just metadata?
 
# ls Ubuntu16.04-WS2016-17.ova
Ubuntu16.04-WS2016-17.ova

# ls -i Ubuntu16.04-WS2016-17.ova
1099751898435 Ubuntu16.04-WS2016-17.ova

# rados -p cephfs_test_data stat 1000e523d43.00000000
cephfs_test_data/1000e523d43.00000000 mtime 2016-10-13 16:20:10.000000,
size 4194304

# rados -p cephfs_test_data listxattr 1000e523d43.00000000
layout
parent

# rados -p cephfs_test_data getxattr 1000e523d43.00000000 parent | strings
Ubuntu16.04-WS2016-17.ova5:
adm2
volumes


The complete path of the file is
/volumes/adm/Ubuntu16.04-WS2016-17.ova5. For a complete check you can
store the content of the parent key and use ceph-dencoder to print its
content:

# rados -p cephfs_test_data getxattr 1000e523d43.00000000 parent >
parent.bin

# ceph-dencoder type inode_backtrace_t import parent.bin decode dump_json
{
     "ino": 1099751898435,
     "ancestors": [
         {
             "dirino": 1099527190071,
             "dname": "Ubuntu16.04-WS2016-17.ova",
             "version": 14901
         },
         {
             "dirino": 1099521974514,
             "dname": "adm",
             "version": 61190706
         },
         {
             "dirino": 1,
             "dname": "volumes",
             "version": 48394885
         }
     ],
     "pool": 7,
     "old_pools": []
}


One important thing to note: ls -i prints the inode id in decimal,
cephfs uses hexadecimal for the rados object names. Thus the different
value in the above commands.

Thank you for this, this is much faster than doing a find for the inode (that took many hours, I let it run overnight and it found it some time. It took about 21 hours to search the whole filesystem.)

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux