On Mon, 5 Oct 2015, Adam Tygart wrote: > Okay, this has happened several more times. Always seems to be a small > file that should be read-only (perhaps simultaneously) on many > different clients. It is just through the cephfs interface that the > files are corrupted, the objects in the cachepool and erasure coded > pool are still correct. I am beginning to doubt these files are > getting a truncation request. This is still consistent with the #12551 bug. The object data is correct, but the cephfs truncation metadata on the object is wrong, causing it to be implicitly zeroed out on read. It's easily triggered by writers who use O_TRUNC on open... > Twice now have been different perl files, once was someones .bashrc, > once was an input file for another application, timestamps on the > files indicate that the files haven't been modified in weeks. > > Any other possibilites? Or any way to figure out what happened? You can confirm by extracting the '_' xattr on the object (append any @1 etc fragments) and feeding it to ceph-dencoder with ceph-dencoder type object_info_t import <path_to_extrated_xattr> decode dump_json and confirming that truncate_seq is 0, and verifying that the truncate_seq on the read request is non-zero.. you'd need to turn up the osd logs with debug ms = 1 and look for the osd_op that looks like "read 0~$length [$truncate_seq@$truncate_size]" (with real values in there). ...but it really sounds like you're hitting the bug. Unfortunately the fix is not backported to hammer just yet. You can follow http://tracker.ceph.com/issues/13034 sage > > -- > Adam > > On Sun, Sep 27, 2015 at 10:44 PM, Adam Tygart <mozes@xxxxxxx> wrote: > > I've done some digging into cp and mv's semantics (from coreutils). If > > the inode is existing, the file will get truncated, then data will get > > copied in. This is definitely within the scope of the bug above. > > > > -- > > Adam > > > > On Fri, Sep 25, 2015 at 8:08 PM, Adam Tygart <mozes@xxxxxxx> wrote: > >> It may have been. Although the timestamp on the file was almost a > >> month ago. The typical workflow for this particular file is to copy an > >> updated version overtop of it. > >> > >> i.e. 'cp qss kstat' > >> > >> I'm not sure if cp semantics would keep the same inode and simply > >> truncate/overwrite the contents, or if it would do an unlink and then > >> create a new file. > >> -- > >> Adam > >> > >> On Fri, Sep 25, 2015 at 8:00 PM, Ivo Jimenez <ivo@xxxxxxxxxxx> wrote: > >>> Looks like you might be experiencing this bug: > >>> > >>> http://tracker.ceph.com/issues/12551 > >>> > >>> Fix has been merged to master and I believe it'll be part of infernalis. The > >>> original reproducer involved truncating/overwriting files. In your example, > >>> do you know if 'kstat' has been truncated/overwritten prior to generating > >>> the md5sums? > >>> > >>> On Fri, Sep 25, 2015 at 2:11 PM Adam Tygart <mozes@xxxxxxx> wrote: > >>>> > >>>> Hello all, > >>>> > >>>> I've run into some sort of bug with CephFS. Client reads of a > >>>> particular file return nothing but 40KB of Null bytes. Doing a rados > >>>> level get of the inode returns the whole file, correctly. > >>>> > >>>> Tested via Linux 4.1, 4.2 kernel clients, and the 0.94.3 fuse client. > >>>> > >>>> Attached is a dynamic printk debug of the ceph module from the linux > >>>> 4.2 client while cat'ing the file. > >>>> > >>>> My current thought is that there has to be a cache of the object > >>>> *somewhere* that a 'rados get' bypasses. > >>>> > >>>> Even on hosts that have *never* read the file before, it is returning > >>>> Null bytes from the kernel and fuse mounts. > >>>> > >>>> Background: > >>>> > >>>> 24x CentOS 7.1 hosts serving up RBD and CephFS with Ceph 0.94.3. > >>>> CephFS is a EC k=8, m=4 pool with a size 3 writeback cache in front of it. > >>>> > >>>> # rados -p cachepool get 10004096b95.00000000 /tmp/kstat-cache > >>>> # rados -p ec84pool get 10004096b95.00000000 /tmp/kstat-ec > >>>> # md5sum /tmp/kstat* > >>>> ddfbe886420a2cb860b46dc70f4f9a0d /tmp/kstat-cache > >>>> ddfbe886420a2cb860b46dc70f4f9a0d /tmp/kstat-ec > >>>> # file /tmp/kstat* > >>>> /tmp/kstat-cache: Perl script, ASCII text executable > >>>> /tmp/kstat-ec: Perl script, ASCII text executable > >>>> > >>>> # md5sum ~daveturner/bin/kstat > >>>> 1914e941c2ad5245a23e3e1d27cf8fde /homes/daveturner/bin/kstat > >>>> # file ~daveturner/bin/kstat > >>>> /homes/daveturner/bin/kstat: data > >>>> > >>>> Thoughts? > >>>> > >>>> Any more information you need? > >>>> > >>>> -- > >>>> Adam > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@xxxxxxxxxxxxxx > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com