> > > > > > At LSF this year, there was a discussion about the "wishlist" > > > > > > for userland file servers. One of the things brought up was > > > > > > the goofy and problematic behavior of POSIX locks when a file is > closed. > > > > > > Boaz started a thread on it here: > > > > > > > > > > > > http://permalink.gmane.org/gmane.linux.file-systems/73364 > > > > > > > > > > > > Userland fileservers often need to maintain more than one open > > > > > > file descriptor on a file. The POSIX spec says: > > > > > > > > > > > > "All locks associated with a file for a given process shall be > > > > > > removed when a file descriptor for that file is closed by that > > > > > > process or the process holding that file descriptor terminates." > > > > > > > > > > > > This is problematic since you can't close any file descriptor > > > > > > without dropping all your POSIX locks. Most userland file > > > > > > servers therefore end up opening the file with more access > > > > > > than is really necessary, and keeping fd's open for longer > > > > > > than is necessary to > > work > > > around this. > > > > > > > > > > > > This patchset is a first stab at an approach to address this > > > > > > problem by adding two new l_type values -- F_RDLCKP and > > > > > > F_WRLCKP (the 'P' is short for "private" -- I'm open to > > > > > > changing that if you have a better mnemonic). > > > > > > > > > > > > For all intents and purposes these lock types act just like > > > > > > their "non-P" counterpart. The difference is that they are > > > > > > only implicitly released when the fd against which they were > > > > > > acquired is closed. As a side effect, these locks cannot be > > > > > > merged with "non-P" locks since they have different semantics on > close. > > > > > > > > > > > > I've given this patchset some very basic smoke testing and it > > > > > > seems to do the right thing, but it is still pretty rough. If > > > > > > this looks reasonable I'll plan to do some documentation > > > > > > updates and will take a stab at trying to get these new lock > > > > > > types added to the POSIX spec (as HCH recommended). > > > > > > > > > > > > At this point, my main questions are: > > > > > > > > > > > > 1) does this look useful, particularly for fileserver implementors? > > > > > > > > > > > > 2) does this look OK API-wise? We could consider different "cmd" > > > values > > > > > > or even different syscalls, but I figured this makes it > > > > > > clearer > > that > > > > > > "P" and "non-P" locks will still conflict with one another. > > > > > > > > This is a good start. > > > > > > > > I'd prefer a model where the private locks are maintained even if > > > > all file descriptors are closed and released on garbage collection > > > > when the process terminates. The model presented would require a > > > > server to potentially have at least two file descriptors open (the > > > > descriptor originally used for the locks, and a descriptor used > > > > for current access mode needed for some I/O operation). The server > > > > will also need to "remember" to do all locks using the first file > descriptor. > > > > > > > > > > That's sort of a non-starter, I think at least in Linux. If you have > > > no > > open file > > > descriptor then you have nothing to hang the lock off of. > > > That sort of interface sounds error-prone and "leaky" too. A long > > > running process could easily end up leaking POSIX locks over time if > > > you forget to explicitly unlock them. > > > > There is a point there, however see below for discussion of file > > descriptor resources. > > > > > > Another thing that would be very useful for servers is to be able > > > > to specify an arbitrary lock owner. Currently, Ganesha has to > > > > manage a union of all locks held on a file and carefully pick it > > > > apart when a client does an unlock. Allowing a process specified > > > > owner would allow Ganesha (or other > > > > servers) to have separate locks for each client lock owner. > > > > > > > > > > The trivial answer there would be to give each lockowner its own > > > file descriptor, right? > > > > Hmm, that would be a solution (of course that would imply that private > > locks held by the same process but by different file descriptors would > > conflict appropriately). > > > > Good point. In the implementation I have so far, POSIX locks held by the > same process don't conflict, just like normal POSIX locks do. For these sorts > of locks, I think it would make sense to have more flock()-like behavior there, > such that locks held by the same process on different file descriptors will still > conflict. I'll plan to make that change on the next pass. Yea, and I think the "private" being part of the descriptor of these locks will make those semantics make sense. These locks are private to the particular file descriptor. > > There is a resource issue though of how many file descriptors we have > open. > > Is there any practical limit on the number of file descriptors a > > process has open? Can the kernel support 1000s of descriptors? How > > much resource does a file descriptor take? Looks like a struct file > > isn't tiny, not quite sure just how big it is. > > > > There is also some consideration of how this interacts with share > > reservations (where is that proposal going BTW?). But I don't think > > this really introduces anything new. We still have to guess the best > > access mode to open a file descriptor that will be used for locks no > > matter how we implement this. > > > > At least in the currently proposed patchset by Pavel, share reservations are > orthogonal to these since they're based on LOCK_MAND > flock() locks. Sure, and I don't think they'll cause an issue. Hmm, what about delegations? An out of tree file system I used to be associated with had issues with Ganesha's opening files read/write because of the POSIX behavior caused fits with delegations. > > So I guess my big concern is the resource impact of lots of file > > descriptors. > > > > That's understandable. I'm not clear on how big an overhead there is on > maintaining an open file descriptor. OTOH, people use flock() and such and it > has similar requirements. Let's mull on that some more. Does anyone have a quick answer to the amount of memory used by an open file descriptor, and the related question of how many open file descriptors it's reasonable for one process to have? > I guess my main concern is that while I'm interested in adding interfaces that > make it _easier_ to implement fileservers, I'm not terribly interested in > adding interfaces that are _specific_ to implementing them. > > Whatever interface we add needs to be generic enough to be useful to other > applications as well. The changes you're suggesting sound rather specific to a > particular use-case. Sure, understood, though the Samba folks may have some input here also (though at least they have a process per connection still right? So the trouble would be a client with lots of processes running on it, not lots of clients running one (or a few) process each. Frank -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html