Yay! The problem is fixed. Boo! Now a new problem is uncovered, I don't have a handle on it yet. Now it is possible to create a broken file on the orangefs server across a restart of the client-core. dbench: (808) open ./clients/client0/~dmtmp/PWRPNT/PPTC112.TMP failed for handle 10042 (No such file or directory) ls -l /pvfsmnt/clients/client0/~dmtmp/PWRPNT ls: cannot access /pvfsmnt/clients/client0/~dmtmp/PWRPNT/PPTC112.TMP: No such file or directory total 1364 -rw-------. 1 root root 85026 Feb 19 14:53 NEWPCB.PPT -rw-------. 1 root root 260096 Feb 19 14:52 PCBENCHM.PPT ??????????? ? ? ? ? ? PPTC112.TMP -rw-------. 1 root root 260096 Feb 19 14:51 PPTOOLS1.PPA -rw-------. 1 root root 260096 Feb 19 14:51 TIPS.PPT -rw-------. 1 root root 260096 Feb 19 14:51 TRIDOTS.POT -rw-------. 1 root root 260096 Feb 19 14:51 ZD16.BMP The filename comes back from the server in the readdir buffer. I can reproduce this, so I'll have to work the problem some more to find more information. First place I'll look is the khandle code <g>... Anywho... The fixed version of the client-core for the other problem is in this SVN repository: http://www.orangefs.org/svn/orangefs/branches/trunk.kernel.update/ As far as orangefs for-next is concerned... I don't see how to update it without destroying the top few commit messages in the commit history. I plan to update the kernel.org orangefs for-next tree to look exactly like the "current" branch of my github tree, unless someone says not to: github.com/hubcapsc/linux/tree/current Latest commit c1223ca -Mike On Thu, Feb 18, 2016 at 7:25 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Thu, Feb 18, 2016 at 04:50:11PM -0500, Mike Marshall wrote: >> As part of the attempt to go upstream, this "hubcap" guy you see >> in the comments worked on a thing that changes 64bit userspace handles >> back and forth into 128bit kernel handles... we did this because >> one day, when we have orangefs3, we will be using 128bit uuid-derived >> handles, and we believe it is our responsibility to not break the >> upstream kernel module. >> >> Anywho, I bet you are right Al, he messed up this part of it... >> I'll look and see if that is really so, and get it fixed. >> >> -Mike "hubcap" > > OK... I'll fold the trivial braino fix (op_is_cancel() checking the wrong > thing) into "orangefs: delay freeing slot until cancel completes" where it > had been introduced, but the rest of it is probably too far and will have > to be a couple of commits on top of that queue. Had it been just my tree, > I probably would still reorder and fold, but I know that my habits in that > respect are rather extreme. > > FWIW, the scenario spotted by Martin wouldn't cause any real problems, but > only because by the time we ended copying to/from daemon service_operation() > couldn't have reached resubmit - it only happens if there had been a purge > and that can't happen while somebody is inside a control device method. > > So the original code had been correct, but it was more brittle than > I'd like *and* making sure that nobody else sees an op by the time > orangefs_clean_interrupted_operation() returns is a good thing. > > New logics gives that, and avoids the need to play with refcounts on ops. > > I've pushed that into #orangefs-untested; if that works, please switch your > for-next to it. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html