Does this patch fix files that have been corrupted in this manner? If not, or I guess even if it does, is there a way to walk the metadata and data pools and find objects that are affected? Is that '_' xattr in hammer? If so, how can I access it? Doing a listxattr on the inode just lists 'parent', and doing the same on the parent directory's inode simply lists 'parent'. Thanks for your time. -- Adam On Mon, Oct 5, 2015 at 9:36 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Mon, 5 Oct 2015, Adam Tygart wrote: >> Okay, this has happened several more times. Always seems to be a small >> file that should be read-only (perhaps simultaneously) on many >> different clients. It is just through the cephfs interface that the >> files are corrupted, the objects in the cachepool and erasure coded >> pool are still correct. I am beginning to doubt these files are >> getting a truncation request. > > This is still consistent with the #12551 bug. The object data is correct, > but the cephfs truncation metadata on the object is wrong, causing it to > be implicitly zeroed out on read. It's easily triggered by writers who > use O_TRUNC on open... > >> Twice now have been different perl files, once was someones .bashrc, >> once was an input file for another application, timestamps on the >> files indicate that the files haven't been modified in weeks. >> >> Any other possibilites? Or any way to figure out what happened? > > You can confirm by extracting the '_' xattr on the object (append any @1 > etc fragments) and feeding it to ceph-dencoder with > > ceph-dencoder type object_info_t import <path_to_extrated_xattr> decode dump_json > > and confirming that truncate_seq is 0, and verifying that the truncate_seq > on the read request is non-zero.. you'd need to turn up the osd logs with > debug ms = 1 and look for the osd_op that looks like "read 0~$length > [$truncate_seq@$truncate_size]" (with real values in there). > > ...but it really sounds like you're hitting the bug. Unfortunately > the fix is not backported to hammer just yet. You can follow > http://tracker.ceph.com/issues/13034 > > sage > > > >> >> -- >> Adam >> >> On Sun, Sep 27, 2015 at 10:44 PM, Adam Tygart <mozes@xxxxxxx> wrote: >> > I've done some digging into cp and mv's semantics (from coreutils). If >> > the inode is existing, the file will get truncated, then data will get >> > copied in. This is definitely within the scope of the bug above. >> > >> > -- >> > Adam >> > >> > On Fri, Sep 25, 2015 at 8:08 PM, Adam Tygart <mozes@xxxxxxx> wrote: >> >> It may have been. Although the timestamp on the file was almost a >> >> month ago. The typical workflow for this particular file is to copy an >> >> updated version overtop of it. >> >> >> >> i.e. 'cp qss kstat' >> >> >> >> I'm not sure if cp semantics would keep the same inode and simply >> >> truncate/overwrite the contents, or if it would do an unlink and then >> >> create a new file. >> >> -- >> >> Adam >> >> >> >> On Fri, Sep 25, 2015 at 8:00 PM, Ivo Jimenez <ivo@xxxxxxxxxxx> wrote: >> >>> Looks like you might be experiencing this bug: >> >>> >> >>> http://tracker.ceph.com/issues/12551 >> >>> >> >>> Fix has been merged to master and I believe it'll be part of infernalis. The >> >>> original reproducer involved truncating/overwriting files. In your example, >> >>> do you know if 'kstat' has been truncated/overwritten prior to generating >> >>> the md5sums? >> >>> >> >>> On Fri, Sep 25, 2015 at 2:11 PM Adam Tygart <mozes@xxxxxxx> wrote: >> >>>> >> >>>> Hello all, >> >>>> >> >>>> I've run into some sort of bug with CephFS. Client reads of a >> >>>> particular file return nothing but 40KB of Null bytes. Doing a rados >> >>>> level get of the inode returns the whole file, correctly. >> >>>> >> >>>> Tested via Linux 4.1, 4.2 kernel clients, and the 0.94.3 fuse client. >> >>>> >> >>>> Attached is a dynamic printk debug of the ceph module from the linux >> >>>> 4.2 client while cat'ing the file. >> >>>> >> >>>> My current thought is that there has to be a cache of the object >> >>>> *somewhere* that a 'rados get' bypasses. >> >>>> >> >>>> Even on hosts that have *never* read the file before, it is returning >> >>>> Null bytes from the kernel and fuse mounts. >> >>>> >> >>>> Background: >> >>>> >> >>>> 24x CentOS 7.1 hosts serving up RBD and CephFS with Ceph 0.94.3. >> >>>> CephFS is a EC k=8, m=4 pool with a size 3 writeback cache in front of it. >> >>>> >> >>>> # rados -p cachepool get 10004096b95.00000000 /tmp/kstat-cache >> >>>> # rados -p ec84pool get 10004096b95.00000000 /tmp/kstat-ec >> >>>> # md5sum /tmp/kstat* >> >>>> ddfbe886420a2cb860b46dc70f4f9a0d /tmp/kstat-cache >> >>>> ddfbe886420a2cb860b46dc70f4f9a0d /tmp/kstat-ec >> >>>> # file /tmp/kstat* >> >>>> /tmp/kstat-cache: Perl script, ASCII text executable >> >>>> /tmp/kstat-ec: Perl script, ASCII text executable >> >>>> >> >>>> # md5sum ~daveturner/bin/kstat >> >>>> 1914e941c2ad5245a23e3e1d27cf8fde /homes/daveturner/bin/kstat >> >>>> # file ~daveturner/bin/kstat >> >>>> /homes/daveturner/bin/kstat: data >> >>>> >> >>>> Thoughts? >> >>>> >> >>>> Any more information you need? >> >>>> >> >>>> -- >> >>>> Adam >> >>>> _______________________________________________ >> >>>> ceph-users mailing list >> >>>> ceph-users@xxxxxxxxxxxxxx >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com