On Tue, Sep 13, 2016 at 1:59 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote: > > > On 09/13/2016 01:33 PM, Ilya Dryomov wrote: >> On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote: >>> Hello list, >>> >>> >>> I have the following cluster: >>> >>> ceph status >>> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0 >>> health HEALTH_OK >>> monmap e2: 5 mons at {alxc10=xxxxx:6789/0,alxc11=xxxxx:6789/0,alxc5=xxxxx:6789/0,alxc6=xxxx:6789/0,alxc7=xxxxx:6789/0} >>> election epoch 196, quorum 0,1,2,3,4 alxc10,alxc5,alxc6,alxc7,alxc11 >>> mdsmap e797: 1/1/1 up {0=alxc11.xxxx=up:active}, 2 up:standby >>> osdmap e11243: 50 osds: 50 up, 50 in >>> pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects >>> 4323 GB used, 85071 GB / 89424 GB avail >>> 8192 active+clean >>> client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s >>> >>> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) and kernel 4.4.14 >>> >>> I have multiple rbd devices which are used as the root for lxc-based containers and have ext4. At some point I want >>> to create a an rbd snapshot, for this the sequence of operations I do is thus: >>> >>> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted >> >> fsfreeze? > > Yes, indeed, my bad. > >> >>> >>> 2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot} >>> >>> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted >>> >>> <= At this point normal container operation continues => >>> >>> 4. Mount the newly created snapshot to a 2nd location as read-only and rsync the files from it to a remote server. >>> >>> However as I start rsyncing stuff to the remote server then certain files in the snapshot are reported as corrupted. >> >> Can you share some dmesg snippets? Is there a pattern - the same >> file/set of files, etc? > > [1718059.910038] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718060.044540] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #52269: comm rsync: deleted inode referenced: 46393 > [1718060.044978] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718060.045246] rbd: rbd143: write 1000 at 0 result -30 > [1718060.045249] blk_update_request: I/O error, dev rbd143, sector 0 > [1718060.045487] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718071.404057] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #385038: comm rsync: deleted inode referenced: 46581 > [1718071.404466] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718071.404739] rbd: rbd143: write 1000 at 0 result -30 > [1718071.404742] blk_update_request: I/O error, dev rbd143, sector 0 > [1718071.404999] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718071.419172] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #769039: comm rsync: deleted inode referenced: 410848 > [1718071.419575] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718071.419844] rbd: rbd143: write 1000 at 0 result -30 > [1718071.419847] blk_update_request: I/O error, dev rbd143, sector 0 > [1718071.420081] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718071.420758] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #769039: comm rsync: deleted inode referenced: 410848 > [1718071.421196] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718071.421441] rbd: rbd143: write 1000 at 0 result -30 > [1718071.421443] blk_update_request: I/O error, dev rbd143, sector 0 > [1718071.421671] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718071.543020] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #52269: comm rsync: deleted inode referenced: 46393 > [1718071.543422] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718071.543680] rbd: rbd143: write 1000 at 0 result -30 > [1718071.543682] blk_update_request: I/O error, dev rbd143, sector 0 > [1718071.543945] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718083.388635] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #385038: comm rsync: deleted inode referenced: 46581 > [1718083.389060] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718083.389324] rbd: rbd143: write 1000 at 0 result -30 > [1718083.389327] blk_update_request: I/O error, dev rbd143, sector 0 > [1718083.389561] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718083.403910] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #769039: comm rsync: deleted inode referenced: 410848 > [1718083.404319] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718083.404581] rbd: rbd143: write 1000 at 0 result -30 > [1718083.404583] blk_update_request: I/O error, dev rbd143, sector 0 > [1718083.404816] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718083.405484] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #769039: comm rsync: deleted inode referenced: 410848 > [1718083.405893] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718083.406140] rbd: rbd143: write 1000 at 0 result -30 > [1718083.406142] blk_update_request: I/O error, dev rbd143, sector 0 > [1718083.406373] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718083.534736] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #52269: comm rsync: deleted inode referenced: 46393 > [1718083.535184] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718083.535449] rbd: rbd143: write 1000 at 0 result -30 > [1718083.535452] blk_update_request: I/O error, dev rbd143, sector 0 > [1718083.535684] Buffer I/O error on dev rbd143, logical block 0, lost sync page write > [1718615.793617] rbd: image c12867: WARNING: kernel layering is EXPERIMENTAL! > [1718615.806239] rbd: rbd143: added with size 0xc80000000 > [1718615.860688] EXT4-fs (rbd143): write access unavailable, skipping orphan cleanup > [1718615.861105] EXT4-fs (rbd143): mounted filesystem without journal. Opts: noload > [1718617.810076] rbd: rbd144: added with size 0xa00000000 > [1718617.862650] EXT4-fs (rbd144): write access unavailable, skipping orphan cleanup > [1718617.863044] EXT4-fs (rbd144): mounted filesystem without journal. Opts: noload > > > Some of the files whic exhibit this: > rsync: readlink_stat("/var/snapshots/c11579-backup-1473764092/var/cpanel/configs.cache/_etc_sysconfig_named___default") failed: Structure needs cleaning (117) > IO error encountered -- skipping file deletion > rsync: readlink_stat("/var/snapshots/c11579-backup-1473764092/var/run/queueprocd.pid") failed: Structure needs cleaning (117) > rsync: readlink_stat("/var/snapshots/c11579-backup-1473764092/var/run/cphulkd_processor.pid") failed: Structure needs cleaning (117) > rsync: readlink_stat("/var/snapshots/c11579-backup-1473764092/var/run/cpdavd.pid") failed: Structure needs cleaning (117) > > The files are different every time. > > >> >>> >>> freezefs implies filesystem syncing I also tested with manually doing sync/syncfs on the fs which is being snapshot. Before >>> and after the freezefs and the corruption is still present. So it's unlikely there are dirty buffers in the page cache. >>> I'm using the kernel rbd driver for the clients. The theory currently is there are some caches which are not being flushed, >>> other than the linux page cache. Reading the doc implies that only librbd is using separate caching but I'm not using librbd. >> >> What happens if you run fsck -n on the snapshot (ro mapping)? > > fsck -n -f run on the RO snapshot: > > http://paste.ubuntu.com/23173304/ > >> >> What happens if you run clone from the snapshot and run fsck (rw >> mapping)? > > fsck -f -n run on the RW clone from the aforementioned snapshot (it has considerably less errors): > http://paste.ubuntu.com/23173306/ > >> >> What happens if you mount the clone without running fsck and run rsync? > > My colleagues told me that running rsync without running fsck from the clone doesn't > cause rsync to error out. This means that somehow the initial snapshot seems broken, > but a clone out of it isn't. That's very odd. Hmm, it could be about whether it is able to do journal replay on mount. When you mount a snapshot, you get a read-only block device; when you mount a clone image, you get a read-write block device. Let's try this again, suppose image is foo and snapshot is snap: # fsfreeze -f /mnt # rbd snap create foo@snap # rbd map foo@snap /dev/rbd0 # file -s /dev/rbd0 # fsck.ext4 -n /dev/rbd0 # mount /dev/rbd0 /foo # umount /foo <full dmesg> # file -s /dev/rbd0 # fsck.ext4 -n /dev/rbd0 # rbd clone foo@snap bar $ rbd map bar /dev/rbd1 # file -s /dev/rbd1 # fsck.ext4 -n /dev/rbd1 # mount /dev/rbd1 /bar # umount /bar <full dmesg> # file -s /dev/rbd1 # fsck.ext4 -n /dev/rbd1 Could you please provide the output for the above? > > >> >> Can you try taking more than one snapshot and then compare them? > > What do you mean? Checksumming the content or something else? Yeah, I was thinking md5sum on the /dev/rbd<x> for starters, but let's figure out the snap vs clone first. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com