On Wed, 2010-03-24 at 19:47 +0200, Boaz Harrosh wrote: > On 03/24/2010 07:32 PM, Boaz Harrosh wrote: > > On 03/24/2010 07:15 PM, Boaz Harrosh wrote: > >> On 03/24/2010 06:39 PM, Al Viro wrote: > >>> On Wed, Mar 24, 2010 at 06:10:52PM +0200, Boaz Harrosh wrote: > >>>> On 03/24/2010 06:07 PM, Al Viro wrote: > >>>>> On Wed, Mar 24, 2010 at 06:04:56PM +0200, Boaz Harrosh wrote: > >>>>>>> Bloody impressive... Does that happen to underlying fs or to what you > >>>>>>> are seeing via NFS? > >>>>>> > >>>>>> Only via NFS. All local access is fine. > >>>>>> > >>>>>> After the corruption above I can cd to the local mount cp a fresh copy > >>>>>> of .git/index file and play around just fine. > >>>>>> Once I return to the NFS mounted directory, a git status will do it. > >>>>>> It does not matter if caches are cold (Takes a long time) or hot it happens > >>>>>> every time. > >>>>>> > >>>>>> Weird I know, I'm playing some more with it as we speak > >>>>> > >>>>> What happens if you export to box running older kernel *or* from box > >>>>> running older kernel? IOW, is that nfsd or nfs client getting unhappy? > >>>>> I'd suspect the latter, but... > >>>> > >>>> > >>>> Good question, I'm just getting to that because currently it's all > >>>> over localhost (same kernel, BTW inside a UML) > >>>> > >>>> I will try what you said. Please through any other tests on me, if needed. > >>> > >> > >> As you suspected old-server+new-client fails. any-thing+old-client is > >> fine. (two separate machines this time) > >> > >>> Very interesting... Just to see which path we are hitting: add > >>> if (IS_ERR(nd->intent.open.file)) > >>> printk("foo: %s", pathname); > >>> right after > >>> error = do_lookup(nd, &nd->last, path); > >>> if (error) > >>> goto exit; > >>> in fs/namei.c:do_last() and see whether we are hitting it or not on objects > >>> that get corrupted. > >> > >> Sorry was busy shifting setups, didn't see your mail, will do that next ... > >> > >> Thanks > >> Boaz > > > > > > Below is what I changed. (I hope its what you meant) > > It does not get hit, just that git corruption as before but I don't see the prints. > > I'll try running with nfs dbg-prints on see what it does around the time gits complains > > > > Boaz > > > > Attached is an output of when I: > $ echo $((0x7fff)) > /proc/sys/sunrpc/nfs_debug > and then run git status. (On a new client) > > We can see the complains after things got broken but what broke it > that's hard for me to see. > > (If the file is too big I'll put it on the web somewhere, see if it arrives) > > Boaz Something weird is going on in your trace: NFS: open file(5b/46ff70a61cf4e159a0339df0e02113bf35f805) NFS: permission(0:12/323044), mask=0x24, res=0 NFS: revalidating (0:12/323044) --> nfs4_setup_sequence clp 00000000791f3000 session (null) sr_slotid 128 <-- nfs4_setup_sequence status=0 encode_compound: tag= decode_attr_type: type=00 decode_attr_change: change attribute=10077553255782547456 decode_attr_size: file size=921 decode_attr_fsid: fsid=(0x0/0x0) decode_attr_fileid: fileid=0 decode_attr_fs_locations: fs_locations done, error = 0 decode_attr_mode: file mode=00 decode_attr_nlink: nlink=1 decode_attr_owner: uid=-2 decode_attr_group: gid=-2 decode_attr_rdev: rdev=(0x0:0x0) decode_attr_space_used: space used=0 decode_attr_time_access: atime=0 decode_attr_time_metadata: ctime=1269422731 decode_attr_time_modify: mtime=1269422731 decode_attr_mounted_on_fileid: fileid=0 decode_getfattr: xdr returned 0 A file type of '0' in the above trace is just wrong, and probably indicates that the server didn't even return that attribute. I'd say you have a corruption issue either on the server side or on your client. Trond -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html