On Sun, Dec 18, 2016 at 08:42:49PM +0100, Jean-Marc Saffroy wrote: > Hi, > > Continuing with my experiments with the DLM user API, I am trying to use > blocking AST callbacks, and find that the rules for the lifetime and > ownership of the dlm_lksb struct are a bit surprising. This led me to some > investigations, and the question at the end of this email. Hi, you're discovering just how old and crusty this userland interface is. Sorry about that :) This userland interface is left over from the earliest dlm implementation, which could generously be called experimental. It has long needed a thorough redesign, but because the dlm is not heavily used from userland, and because user/kernel interfaces are hard, it's never been done. > It looks like the kernel remembers the pointer to the lksb struct used to > issue the dlm_lock call, and libdlm happily overwrites this piece of > memory whenever the kernel issues an event related to that lock, including > just before firing a BAST callback. It is a bit frustrating because I got > caught by surprise wondering why something was smashing my stack, ie. the > place where I had once laid out my dlm_lksb, thinking that it was okay to > release its memory after the completion AST callback has completed. The way the kernel saves and restores the pointers is very unpleasant, and handling lifetimes of lock structs/memory a big pain. In a quick look at this code, I'm not seeing any simple or obvious ways to avoid the lksb behavior you're describing, but it's been a while since I was very familiar with this area. With more study, it's possible that a fix could be found, but it seems a bit unlikely. As a workaround to avoid an unwanted bast callback after a completion, I wonder if you could make a no-op call with NULL astaddr/astarg to prevent any further callback using those? > For now I have (apparently) working test code that deals with this in the > following way: for a given lock (identified by its lockid), I keep two > dlm_lksb structs and a bit indicating which of the two is free to use for > conversions. I update the bit every time the CAST (not BAST) callback > completes, thus doing some kind of double buffering. OK, I don't know enough about the details to say whether there are any subtle issues with this or not. > So I assume that: > > - each lock acquisition or conversion call gives ownership of the lksb to > the kernel and libdlm (because a BAST callback can fire at any time and > will overwrite the struct), causing the kernel/libdlm to forget about the > previously owned lksb (meaning the caller can/should then dispose of it) That sounds about right. > - AST and BAST callbacks run in order, such that after the CAST completes, > and until a conversion occurs, a BAST firing will only overwrite the lksb > given on the last lock or conversion > > Are my assumptions correct? That also sounds like it should be true. To say with more certainty would require closer study of the code, because whatever rules exist are a function of the current implemention, and not derived from higher design rules per se. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster