Hi Alexandre, On Sat, 25 Feb 2012, Alexandre Oliva wrote: > On Feb 23, 2012, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > On Tue, 21 Feb 2012, Alexandre Oliva wrote: > >> This was supposed to fix bug 1946, and likely bug 1849 too, but it looks > >> like something's still missing for a complete fix. fuse-unmounting > >> between touching a dir and creating a snapshot seems to help get correct > >> snapshot timestamp, > > > Hmm, that sounds like ceph-fuse isn't sending the write flushsnap cap > > message. I forget.. have you tried the same with the kernel client? > > Not recently enough that I'd remember exactly what I did. > > However, I don't see how ceph-fuse could be the problem, given today's > experiments. Here's what I just did (all with fuse): > > create snapshot > check timestamps -> baseline > unmount > mount again > check timestamps -> same > restart mds > check timestamps -> same > unmount > mount again > check timestamps -> same > moved a tree into dir > check timestamps -> dir changed and snapshot unchanged, as expected > unmount > mount again > check timestamps -> same > restart mds > check timestamps -> snapshot changed to dir's; its size too! > > After each umount, I checked that ceph-fuse was no longer running (it > sometimes remains running for a while after umount completes) It looks like the problem is that CInode::first isn't being journaled. Normally, that's fine because it matches the referring dentry.. but for multiversion inodes (like snapped directories), it won't match. On replay we end up with bad value of 2, and it re-cows and clobbers the original old value. I pushed wip-1946 with a fix. Want to give it a go? I wasn't able to directly reproduce the behavior you're seeing, but I did see it doing a bad cow_old_inode() and observed that this patch fixes that part. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html