On 06/11/2012 05:11 PM, Jeff Layton wrote: > On Mon, 11 Jun 2012 17:05:28 +0300 > Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote: > >> On 06/11/2012 04:51 PM, Jeff Layton wrote: >> >>> >>> That was considered here, but the problem with the usermode helper is >>> that you can't pass anything back to the kernel but a simple status >>> code (and that's assuming that you wait for it to exit). In the near >>> future, we'll need to pass back more info to the kernel for this, so >>> the usermode helper callout wasn't suitable. >>> >> >> >> I have answered that in my mail. Repeated here again. Well you made >> a simple mistake. Because it is *easy* to pass back any number and >> size of information from user-mode. >> >> You just setup a sysfs entry points where the answers are written >> back to. It's an easy trick to setup a thread safe, way with a >> cookie but 90% of the time you don't have to. Say you set up >> a structure of per-client (identified uniquely) then user mode >> answers back per client, concurrency will not do any harm, since >> you answer to the same question the same answer. ans so on. Each >> problem it's own. >> >> If you want we can talk about this, it would be easy for me to setup >> a toll free conference number we can all use. > > That helpful advice would have been welcome about 3-4 months ago when I > first proposed this in detail. At that point you're working with > multiple upcall/downcall mechanisms, which was something I was keen to > avoid. > > I'm not opposed to moving in that direction, but it basically means > you're going to rip out everything I've got here so far and replace it. > > If you're willing to do that work, I'll be happy to work with you on > it, but I don't have the time or inclination to do that on my own right > now. > No such luck. sorry. I wish I could, but coming from a competing server company, you can imagine the priority of that ever happening. (Even though I use the Linux-Server everyday for my development and am putting lots of efforts into still, mainly in pnfs) Hopefully re-examining the code, it could all be salvaged just the same, only lots of code thrown a way. But mean-while please address my concern below: Boaz Harrosh wrote: > One more thing, the most important one. We have already fixed that in the > past and I was hoping the lesson was learned. Apparently it was not, and > we are doomed to do this mistake for ever!! > > What ever crap fails times out and crashes, in the recovery code, we don't > give a dam. It should never affect any Server-client communication. > > When the grace periods ends the clients gates opens period. *Any* error > return from state recovery code must be carefully ignored and normal > operations resumed. At most on error, we move into a mode where any > recovery request from client is accepted, since we don't have any better > data to verify it. > > Please comb recovery code to make sure any catastrophe is safely ignored. > We already did that before and it used to work. We should make sure that any state recovery code does not interfere with regular operations. and fails gracefully / shuts up. We used to have that, apparently it re-broke. Clients should always be granted access, after grace period. And Server should be made sure not to fail in any situation. I would look into it but I'm not uptodate anymore, I wish you or Bruce could. Thanks for your work so far, sorry to be bearer of bad news Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html