Re: help understanding the current (and future) state of NFSv4 locking?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Nov 16, 2008 at 2:48 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> On Fri, Nov 14, 2008 at 03:34:56PM -0500, Mike Snitzer wrote:
>> Hello,
>>
>> I'd like to understand the state of Linux's NFSv4 server regarding the
>> NFSv4 spec's _optional_ ordered blocking lock list implementation.
>> Without something like the following patch isn't there still concern
>> for NFSv4 clients being starved from ever getting a conflicting lock
>> (local POSIX or lockd waiters would race to get it first)?
>
> Yes.  I have patches that approach the problem by:
>
>        - Defining a new type of lock type, the "provisional" lock,
>          which is just like a posix lock type, *except* that it
>          doesn't merge with other locks, and hence can still be cancelled
>          safely.
>        - Modifies the process of waking up waiters for a just-released
>          lock to make it a two-step process:
>                1. Apply a "provisional" lock, if there are no
>                   conflicts, and wake whoever was waiting for it.  (If
>                   there are still conflicts, put the lock on the new
>                   list without waking anyone.)
>                2. Allow the waiter to upgrade the provisional lock to a
>                   real posix lock (or, alternatively, to cancel it).
>        - Take advantage of the above to implement fair queuing for v4,
>          by stretching out the gap between steps 1 and 2 up to a lease
>          period, thus allowing a lock that is available but that a
>          client has yet polled for to be temporarily represented by a
>          provisional lock.
>
> The thought was that we'd also solve a couple other problems along the
> way, by:
>
>        - Preventing thundering herd problems on posix locks with lots
>          of waiters.
>        - Increasing fairness of posix locking (even among local
>          lockers).
>
> But we weren't able to actually show any improvement for posix locks
> with local waiters, and it's unclear whether anyone cares much about
> posix lock fairness.
>
> So it's unclear whether it's worth doing the 2-step process above for
> all posix lockers.  So maybe the patches should be written to instead
> implement provisional locks as an optional extra for use of the v4
> server.

Thanks for the overview.  I think that given how easy it is to starve
a v4 client (see below) something needs to give.

> A real-world test case (showing starvation of v4 clients) would be
> interesting if anyone had one.

I'm not sure what your definition of "real-world test case" is (so
maybe the following is moot) but the attached program (written by a
co-worker) can be used to easily illustrate the starvation of v4
clients.  The program tests to see how long it takes to lock/unlock a
file 1000 times.  If ran locally on the nfs-server against an exported
ext3 FS I get a "step time" of 11ms.  Ran from a v3 client: ~390ms.
Ran from a v4 client: ~430ms.

ran simultaneously on the nfs-server and the v3 client; local=~30ms, v3=~440ms
ran simultaneously on two v3 clients; both v3=~580ms
ran simultaneously on the nfs-server and the two v3 clients; both
v3=~580ms, but local ranges ~1500ms to ~9300ms
ran simultaneously on two v4 clients; v4=~430ms but with frequent
interleaved outliers ranging from ~1500ms to ~75000ms
ran simultaneously on the nfs-server and the v4 client; local=~11ms, v4=STARVED
ran simultaneously on the v3 and the v4 client; v3=~390ms, v4=STARVED

FYI, "STARVED" above doesn't mean the v4 client _never_ acquires the
lock.   It eventually acquires the lock; albeit extremely rarely (e.g.
after 5min) because v4 client polling is predisposed to lose the race
with either the hyper-active v3 client or the local locker.

Mike
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/time.h>
#include <string.h>
#include <errno.h>

#define STEP 1000
#define CMD F_SETLKW

int main(int argc, char **argv) {
	struct flock lc_rq = {F_WRLCK, SEEK_CUR, 0, 100, 0};
	int i, ret = 0;
	char str[10];
	unsigned long sleep_timeout = 0;
	struct timeval tv_start, tv_prev, tv_cur;
	int fd1 = open(argv[1], O_RDWR), fd2 = 0;
	off_t start = 0;

	gettimeofday(&tv_start, NULL);
	tv_cur = tv_prev = tv_start;

	for (i=0;;i++) {
		lc_rq.l_type = F_WRLCK;
		ret = fcntl(fd1, CMD, &lc_rq);

		lc_rq.l_type = F_UNLCK;
		ret = fcntl(fd1, CMD, &lc_rq);

		if (0 == i%STEP) {
			gettimeofday(&tv_cur, NULL);
			printf("i=%d tot time = %dms (step time= %dms)\n", i/STEP,
			       1000*(tv_cur.tv_sec - tv_start.tv_sec) + (tv_cur.tv_usec - tv_start.tv_usec)/1000,
			       1000*(tv_cur.tv_sec - tv_prev.tv_sec) + (tv_cur.tv_usec - tv_prev.tv_usec)/1000);
			tv_prev = tv_cur;
		}
	}
	return 0;
}


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux