On Mon, Sep 26, 2016 at 11:13 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Mon, Sep 26, 2016 at 8:39 AM, Nikolay Borisov <kernel@xxxxxxxx> wrote: >> >> >> On 09/22/2016 06:36 PM, Ilya Dryomov wrote: >>> On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >>>> On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote: >>>>> >>>>> [snipped] >>>>> >>>>> cat /sys/bus/rbd/devices/47/client_id >>>>> client157729 >>>>> cat /sys/bus/rbd/devices/1/client_id >>>>> client157729 >>>>> >>>>> Client client157729 is alxc13, based on correlation by the ip address >>>>> shown by the rados -p ... command. So it's the only client where the rbd >>>>> images are mapped. >>>> >>>> Well, the watches are there, but cookie numbers indicate that they may >>>> have been re-established, so that's inconclusive. >>>> >>>> My suggestion would be to repeat the test and do repeated freezes to >>>> see if snapshot continues to follow HEAD. >>>> >>>> Further, to rule out a missed snap context update, repeat the test, but >>>> stick >>>> >>>> # echo 1 >/sys/bus/rbd/devices/<ID_OF_THE_ORIG_DEVICE>/refresh >>>> >>>> after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE >>>> would be 47). >>> >>> Hi Nikolay, >>> >>> Any news on this? >> >> Hello, >> >> I was on holiday hence the radio silence. Here is the latest set of >> tests that were run: >> >> Results: >> >> c11579 (100GB - used: 83GB): >> root@alxc13:~# rbd showmapped |grep c11579 >> 47 rbd c11579 - /dev/rbd47 >> root@alxc13:~# fsfreeze -f /var/lxc/c11579 >> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 686.382 s, 156 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> root@alxc13:~# rbd snap create rbd/c11579@snap_test >> root@alxc13:~# rbd map c11579@snap_test >> /dev/rbd1 >> root@alxc13:~# echo 1 >/sys/bus/rbd/devices/47/refresh >> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 915.225 s, 117 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 863.464 s, 143 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> root@alxc13:~# file -s /dev/rbd1 >> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) >> (huge files) >> root@alxc13:~# fsfreeze -u /var/lxc/c11579 >> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 730.243 s, 147 MB/s >> 65294ce9eae5694a56054ec4af011264 /dev/fd/63 >> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 649.373 s, 165 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> >> 30min later: >> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 648.328 s, 166 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> >> >> >> c12607 (30GB - used: 4GB): >> root@alxc13:~# rbd showmapped |grep c12607 >> 39 rbd c12607 - /dev/rbd39 >> root@alxc13:~# fsfreeze -f /var/lxc/c12607 >> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 228.2 s, 141 MB/s >> e6ce3ea688a778b9c732041164b4638c /dev/fd/63 >> root@alxc13:~# rbd snap create rbd/c12607@snap_test >> root@alxc13:~# rbd map c12607@snap_test >> /dev/rbd21 >> root@alxc13:~# rbd snap protect rbd/c12607@snap_test >> root@alxc13:~# echo 1 >/sys/bus/rbd/devices/39/refresh >> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 217.138 s, 148 MB/s >> e6ce3ea688a778b9c732041164b4638c /dev/fd/63 >> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 212.254 s, 152 MB/s >> e6ce3ea688a778b9c732041164b4638c /dev/fd/63 >> root@alxc13:~# file -s /dev/rbd21 >> /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files) >> (huge files) >> root@alxc13:~# fsfreeze -u /var/lxc/c12607 >> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 322.964 s, 99.7 MB/s >> 71c5efc24162452473cda50155cd4399 /dev/fd/63 >> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 326.273 s, 98.7 MB/s >> e6ce3ea688a778b9c732041164b4638c /dev/fd/63 >> root@alxc13:~# file -s /dev/rbd21 >> /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files) >> (huge files) >> root@alxc13:~# >> >> 30min later: >> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 359.917 s, 89.5 MB/s >> e6ce3ea688a778b9c732041164b4638c /dev/fd/63 >> >> Everything seems consistent, but when an rsync was initiated from the >> snapshot it again failed. Unfortunately I deem those results rather >> unstable because they now contradict the ones which I showed you earlier >> with the differing checksums. > > Could you try running a few of these instead of just one? I'm > interested in the following data points: > > - whether echo 1 >refresh consistently makes a difference (even if it's > just md5sum and not a successful rsync) - if it does, that's a krbd > issue which will need to be looked at regardless, so try with and w/o > manual refresh > > - how does dd iflag=direct a freezed HEAD (/dev/rbd47) into a file on > another FS + losetup -r + mount + rsync behave - add a cp before "rbd ^^ dd, of course Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com