Re: Consistency problems when taking RBD snapshot

Ilya Dryomov <idryomov@xxxxxxxxx> · Tue, 13 Sep 2016 12:33:22 +0200

On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote:
> Hello list,
>
>
> I have the following cluster:
>
> ceph status
>     cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0
>      health HEALTH_OK
>      monmap e2: 5 mons at {alxc10=xxxxx:6789/0,alxc11=xxxxx:6789/0,alxc5=xxxxx:6789/0,alxc6=xxxx:6789/0,alxc7=xxxxx:6789/0}
>             election epoch 196, quorum 0,1,2,3,4 alxc10,alxc5,alxc6,alxc7,alxc11
>      mdsmap e797: 1/1/1 up {0=alxc11.xxxx=up:active}, 2 up:standby
>      osdmap e11243: 50 osds: 50 up, 50 in
>       pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects
>             4323 GB used, 85071 GB / 89424 GB avail
>                 8192 active+clean
>   client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s
>
> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) and kernel 4.4.14
>
> I have multiple rbd devices which are used as the root for lxc-based containers and have ext4. At some point I want
> to create a an rbd snapshot, for this the sequence of operations I do is thus:
>
> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted

fsfreeze?

>
> 2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot}
>
> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted
>
> <= At this point normal container operation continues =>
>
> 4. Mount the newly created snapshot to a 2nd location as read-only and rsync the files from it to a remote server.
>
> However as I start rsyncing stuff to the remote server then certain files in the snapshot are reported as corrupted.

Can you share some dmesg snippets?  Is there a pattern - the same
file/set of files, etc?

>
> freezefs implies filesystem syncing I also tested with manually doing sync/syncfs on the fs which is being snapshot. Before
> and after the freezefs and the corruption is still present. So it's unlikely there are dirty buffers in the page cache.
> I'm using the kernel rbd driver for the clients. The theory currently is there are some caches which are not being flushed,
> other than the linux page cache. Reading the doc implies that only librbd is using separate caching but I'm not using librbd.

What happens if you run fsck -n on the snapshot (ro mapping)?

What happens if you run clone from the snapshot and run fsck (rw
mapping)?

What happens if you mount the clone without running fsck and run rsync?

Can you try taking more than one snapshot and then compare them?

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com