On 09/15/2016 03:15 PM, Ilya Dryomov wrote: > On Thu, Sep 15, 2016 at 12:54 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote: >> >> >> On 09/15/2016 01:24 PM, Ilya Dryomov wrote: >>> On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov >>> <n.borisov@xxxxxxxxxxxxxx> wrote: >>>> >>>> >>>> On 09/15/2016 09:22 AM, Nikolay Borisov wrote: >>>>> >>>>> >>>>> On 09/14/2016 05:53 PM, Ilya Dryomov wrote: >>>>>> On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote: >>>>>>> >>>>>>> >>>>>>> On 09/14/2016 02:55 PM, Ilya Dryomov wrote: >>>>>>>> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov <kernel@xxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 09/14/2016 09:55 AM, Adrian Saul wrote: >>>>>>>>>> >>>>>>>>>> I found I could ignore the XFS issues and just mount it with the appropriate options (below from my backup scripts): >>>>>>>>>> >>>>>>>>>> # >>>>>>>>>> # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) >>>>>>>>>> # >>>>>>>>>> if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then >>>>>>>>>> echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - cleaning up" >>>>>>>>>> rbd unmap $SNAPDEV >>>>>>>>>> rbd snap rm ${RBDPATH}@${DATESTAMP} >>>>>>>>>> exit 3; >>>>>>>>>> fi >>>>>>>>>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" >>>>>>>>>> >>>>>>>>>> It's impossible without clones to do it without norecovery. >>>>>>>>> >>>>>>>>> But shouldn't freezing the fs and doing a snapshot constitute a "clean >>>>>>>>> unmount" hence no need to recover on the next mount (of the snapshot) - >>>>>>>>> Ilya? >>>>>>>> >>>>>>>> I *thought* it should (well, except for orphan inodes), but now I'm not >>>>>>>> sure. Have you tried reproducing with loop devices yet? >>>>>>> >>>>>>> Here is what the checksum tests showed: >>>>>>> >>>>>>> fsfreeze -f /mountpoit >>>>>>> md5sum /dev/rbd0 >>>>>>> f33c926373ad604da674bcbfbe6460c5 /dev/rbd0 >>>>>>> rbd snap create xx@xxx && rbd snap protect xx@xxx >>>>>>> rbd map xx@xxx >>>>>>> md5sum /dev/rbd1 >>>>>>> 6f702740281874632c73aeb2c0fcf34a /dev/rbd1 >>>>>>> >>>>>>> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed >>>>>>> different, worrying. >>>>>> >>>>>> Sorry, for the filesystem device you should do >>>>>> >>>>>> md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M) >>>>>> >>>>>> to get what's actually on disk, so that it's apples to apples. >>>>> >>>>> root@alxc13:~# rbd showmapped |egrep "device|c11579" >>>>> id pool image snap device >>>>> 47 rbd c11579 - /dev/rbd47 >>>>> root@alxc13:~# fsfreeze -f /var/lxc/c11579 >>>>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >>>>> 12800+0 records in >>>>> 12800+0 records out >>>>> 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s >>>>> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum after freeze >>>>> root@alxc13:~# rbd snap create rbd/c11579@snap_test >>>>> root@alxc13:~# rbd map c11579@snap_test >>>>> /dev/rbd1 >>>>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >>>>> 12800+0 records in >>>>> 12800+0 records out >>>>> 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s >>>>> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum of snapshot >>>>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >>>>> 12800+0 records in >>>>> 12800+0 records out >>>>> 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s >>>>> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum of original device, not changed - GOOD >>>>> root@alxc13:~# file -s /dev/rbd1 >>>>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge files) >>>>> root@alxc13:~# fsfreeze -u /var/lxc/c11579 >>>>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >>>>> 12800+0 records in >>>>> 12800+0 records out >>>>> 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s >>>>> 92b7182591d7d7380435cfdea79a8897 /dev/fd/63 <--- After unfreeze checksum is different - OK >>>>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >>>>> 12800+0 records in >>>>> 12800+0 records out >>>>> 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s >>>>> bc3b68f0276c608d9435223f89589962 /dev/fd/63 <--- Why the heck the checksum of the snapshot is different after unfreeze? BAD? >>>>> root@alxc13:~# file -s /dev/rbd1 >>>>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) (extents) (large files) (huge files) >>>>> root@alxc13:~# >>>>> >>>> >>>> And something even more peculiar - taking an md5sum some hours after the >>>> above test produced this: >>>> >>>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >>>> 12800+0 records in >>>> 12800+0 records out >>>> 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s >>>> e68e41616489d41544cd873c73defb08 /dev/fd/63 >>>> >>>> Meaning the read-only snapshot somehow has "mutated". E.g. it wasn't >>>> recreated, just the same old snapshot. Is this normal? >>> >>> Hrm, I wonder if it missed a snapshot context update. Please pastebin >>> entire dmesg for that boot. >> >> The machine has been up more than 2 and the dmesg has been rewritten >> several times for that time. Also the node is rather busy so there's >> plenty of irrelevant stuff in the dmesg. Grepped for rbd1/0 and found no >> strings containing them so it's unlikely you will get anything useful. > > Kernel messages are logged, you can get to them with journalctl -k or > syslog. Grep for libceph? > >> >>> >>> Have those devices been remapped or alxc13 rebooted since then? If >>> not, what's the output of >>> >>> $ rados -p rbd listwatchers $(rbd info c11579 | grep block_name_prefix >>> | awk '{ print $2 }' | sed 's/rbd_data/rbd_header/') >> >> watcher=xx.xxx.xxx.xx:0/3416829538 client.157729 cookie=673 >> watcher=xx.xxx.xxx.xx:0/3416829538 client.157729 cookie=676 > > What's the output of > > $ cat /sys/bus/rbd/devices/47/client_id > $ cat /sys/bus/rbd/devices/1/client_id cat /sys/bus/rbd/devices/47/client_id client157729 cat /sys/bus/rbd/devices/1/client_id client157729 Client client157729 is alxc13, based on correlation by the ip address shown by the rados -p ... command. So it's the only client where the rbd images are mapped. >> >> >>> >>> and can you check whether that snapshot is continuing to mutate as the >>> image is mutated - freeze /var/lxc/c11579 again and check rbd47 and >>> rbd1? >> >> That would take a bit more time since it involves downtime to production >> workloads. >> >> Btw, are you on IRC in ceph/ceph-devel ? > > dis on #ceph-devel, but I'd rather do this via email. > > Thanks, > > Ilya > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com