Re: CephFS file to rados object mapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




See below

On 10/21/15 2:44 PM, Gregory Farnum wrote:
On Wed, Oct 14, 2015 at 7:20 PM, Francois Lafont <flafdivers@xxxxxxx> wrote:
Hi,

On 14/10/2015 06:45, Gregory Farnum wrote:

Ok, however during my tests I had been careful to replace the correct
file by a bad file with *exactly* the same size (the content of the
file was just a little string and I have changed it by a string with
exactly the same size). I had been careful to undo the mtime update
too (I had restore the mtime of the file before the change). Despite
this, the "repair" command worked well. Tested twice: 1. with the change
on the primary OSD and 2. on the secondary OSD. And I was surprised
because I though the test 1. (in primary OSD) will fail.
Hm. I'm a little confused by that, actually. Exactly what was the path
to the files you changed, and do you have before-and-after comparisons
on the content and metadata?
I didn't remember exactly the process I have made so I have just retried
today. Here is my process. I have a healthy cluster with 3 nodes (Ubuntu
Trusty) and I have ceph Hammer (version 0.94.3). I have mounted cephfs on
/mnt on one of the nodes.

~# cat /mnt/file.txt # yes it's a little file. ;)
123456

~# ls -i /mnt/file.txt
1099511627776 /mnt/file.txt

~# printf "%x\n" 1099511627776
10000000000

~# rados -p data ls - | grep 10000000000
10000000000.00000000

I have the name of the object mapped to my "file.txt".

~# ceph osd map data 10000000000.00000000
osdmap e76 pool 'data' (3) object '10000000000.00000000' -> pg 3.f0b56f30 (3.30) -> up ([1,2], p1) acting ([1,2], p1)

So my object is in the primary OSD OSD-1 and in the secondary OSD OSD-2.
So I open a terminal in the node which hosts the primary OSD OSD-1 and
then:

~# cat /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3
123456

~# ll /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3
-rw-r--r-- 1 root root 7 Oct 15 03:46 /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3

Now, I change the content with this script called "change_content.sh" to
preserve the mtime after the change:

-----------------------------
#!/bin/sh

f="$1"
f_tmp="${f}.tmp"
content="$2"
cp --preserve=all "$f" "$f_tmp"
echo "$content" >"$f"
touch -r "$f_tmp" "$f" # to restore the mtime after the change
rm "$f_tmp"
-----------------------------

So, let's go, I replace the content by a new content with exactly
the same size (ie "ABCDEF" in this example):

~# ./change_content.sh /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3 ABCDEF

~# cat /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3
ABCDEF

~# ll /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3
-rw-r--r-- 1 root root 7 Oct 15 03:46 /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3

Now, the secondary OSD contains the good version of the object and
the primary a bad version. Now, I launch a "ceph pg repair":

~# ceph pg repair 3.30
instructing pg 3.30 on osd.1 to repair

# I'm in the primary OSD and the file below has been repaired correctly.
~# cat /var/lib/ceph/osd/ceph-1/current/3.30_head/10000000000.00000000__head_F0B56F30__3
123456

As you can see, the repair command has worked well.
Maybe my little is too trivial?
Hmm, maybe David has some idea.

As of the Hammer release, a replicated object that is written sequentially maintains a CRC of the entire object. This no I/O cost CRC is saved with other object information like size and mtime. So in your test the bad replica is identified by comparing the CRC of what is read off of disk with the value in the object info.

David

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux