Re: Consistency problems when taking RBD snapshot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 09/22/2016 06:36 PM, Ilya Dryomov wrote:
> On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>> On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote:
>>>
>>> [snipped]
>>>
>>> cat /sys/bus/rbd/devices/47/client_id
>>> client157729
>>> cat /sys/bus/rbd/devices/1/client_id
>>> client157729
>>>
>>> Client client157729 is alxc13, based on correlation by the ip address
>>> shown by the rados -p ... command. So it's the only client where the rbd
>>> images are mapped.
>>
>> Well, the watches are there, but cookie numbers indicate that they may
>> have been re-established, so that's inconclusive.
>>
>> My suggestion would be to repeat the test and do repeated freezes to
>> see if snapshot continues to follow HEAD.
>>
>> Further, to rule out a missed snap context update, repeat the test, but
>> stick
>>
>> # echo 1 >/sys/bus/rbd/devices/<ID_OF_THE_ORIG_DEVICE>/refresh
>>
>> after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE
>> would be 47).
> 
> Hi Nikolay,
> 
> Any news on this?

Hello,

I was on holiday hence the radio silence. Here is the latest set of
tests that were run:

Results:

c11579 (100GB - used: 83GB):
root@alxc13:~# rbd showmapped |grep c11579
47  rbd  c11579 -                        /dev/rbd47
root@alxc13:~# fsfreeze -f /var/lxc/c11579
root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 686.382 s, 156 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
root@alxc13:~# rbd snap create rbd/c11579@snap_test
root@alxc13:~# rbd map c11579@snap_test
/dev/rbd1
root@alxc13:~# echo 1 >/sys/bus/rbd/devices/47/refresh
root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 915.225 s, 117 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 863.464 s, 143 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
root@alxc13:~# file -s /dev/rbd1
/dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files)
(huge files)
root@alxc13:~# fsfreeze -u /var/lxc/c11579
root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 730.243 s, 147 MB/s
65294ce9eae5694a56054ec4af011264  /dev/fd/63
root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 649.373 s, 165 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63

30min later:
root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 648.328 s, 166 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63



c12607 (30GB - used: 4GB):
root@alxc13:~# rbd showmapped |grep c12607
39  rbd  c12607 -                        /dev/rbd39
root@alxc13:~# fsfreeze -f /var/lxc/c12607
root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 228.2 s, 141 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
root@alxc13:~# rbd snap create rbd/c12607@snap_test
root@alxc13:~# rbd map c12607@snap_test
/dev/rbd21
root@alxc13:~# rbd snap protect rbd/c12607@snap_test
root@alxc13:~# echo 1 >/sys/bus/rbd/devices/39/refresh
root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 217.138 s, 148 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 212.254 s, 152 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
root@alxc13:~# file -s /dev/rbd21
/dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files)
(huge files)
root@alxc13:~# fsfreeze -u /var/lxc/c12607
root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 322.964 s, 99.7 MB/s
71c5efc24162452473cda50155cd4399  /dev/fd/63
root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 326.273 s, 98.7 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
root@alxc13:~# file -s /dev/rbd21
/dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files)
(huge files)
root@alxc13:~#

30min later:
root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 359.917 s, 89.5 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63

Everything seems consistent, but when an rsync was initiated from the
snapshot it again failed. Unfortunately I deem those results rather
unstable because they now contradict the ones which I showed you earlier
with the differing checksums.


> 
> Thanks,
> 
>                 Ilya
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux