On Thu, Mar 31, 2005 at 09:37:51AM +0100, Patrick Caulfield wrote: > On Thu, Mar 31, 2005 at 04:27:07PM +0800, David Teigland wrote: > > Sure, the mechanism used to export the locking API to user space is pretty > > inconsequential. We're doing reads/writes on a misc device at the moment > > (used through libdlm of course.) Going through an fs might be better but > > I'm not sure why. > > A long time ago, we did consider a filesystem interface to the DLM. We rejected > it for a couple of reasons: > > 1) the mapping of locks to files is not a very clean one. Trying to squeeze > things like LVBs and ranges into the API soon gets very messy. Returning status > from asynchronous operations can make the coding rather complicated for > applications with multiple locks (you would need a file descriptor open for > each lock!). Also the hierarchy functions differently: a lock that has children > is still a lock, not just a directory. Well it's actually quite clean in ocfs2_dlmfs, part of that is likely related to some design calls we made early on to simplify our userspace locking. We don't do ranges (anywhere really), and we consider all userspace lock requests to be synchronous. This does however result in a userspace API which is extremely lightweight and dirt simple to use. mkdir gives you a new domain, files created within that directory correspond to lock resource with the same name. Open O_RDONLY gets you a PR mode lock, open RDWR gives you an EX mode lock. You can do NOQUEUE (trylock) ops with O_NONBLOCK. Reads and writes to the file return and set the LVB accordingly. One can literally, create a domain, create locks within it and ship data via the LVB all from a bash shell on my cluster nodes. I was able to write a trivial library wrapper (for those who don't want to use shell for controlling dlm functionality) in about 600 lines. --Mark > 2) At the time it was still very complicated to add new filesystems to the Linux > kernel. This has now changed of course. > > I'll have a look at the OCFS2 filesystem and see if we can learn anything from > it though. > > -- > > patrick > > -- > > Linux-cluster@xxxxxxxxxx > http://www.redhat.com/mailman/listinfo/linux-cluster -- Mark Fasheh Software Developer, Oracle mark.fasheh@xxxxxxxxxx