Re: How to fix wrong telldir/seekdir usage

Anand Avati <avati@xxxxxxxxxxx> · Sat, 13 Sep 2014 13:34:58 -0700

How does the NetBSD nfs server provide stable directory offsets, for the NFS client to resume reading from at a later point in time? Very similar problems are present in that scenario and it might be helpful to see what approaches are taken there (which are probably more tried and tested)
Thanks

On Sat, Sep 13, 2014 at 12:02 PM, Emmanuel Dreyfus <manu@xxxxxxxxxx> wrote:
In <1lrx1si.n8tms1igmi5pM%manu@xxxxxxxxxx> I explained why NetBSD

currently fails self-heald.t, but since the subjet is burried deep in a

thread, it might be worth starting a new one to talk about how to fix.

In 3 places within glusterfs code (features/index,

features/snapview-server and storage/posix), a server component answers

readdir requests on a directory which may be split in mulitple calls.

To answer one call, we have the following library calls:

- opendir()

- seekdir() to resume where the previous request was

- readdir()

- telldir() to record where we are for the next request

- closedir()

This relies on unspecified behavior, as POSIX says: "The value of loc

should have been returned from an earlier call to telldir() using the

same directory stream."

http://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html

Since we do opendir() and closedir() at each time, we do not use the

same directory stream. It causes an infinite loop on NetBSD because it

badly resume from previous request, and in the general case it will

break badly if an entry is added in the directory between two requests.

How can we fix that?

1) we can keep the directory stream open. The change is intrusive since

we will need a chained list of open contexts, and we need to clean them

if they timeout.

2) in order to keep state between requests, we can use the entry index

(first encoutered is 1, and so on) instead of values returned by

telldir(). That works around the unspecified behavior, but it still

breaks if directory content is changed between two requests

3) make sure the readdir is done in a single request. That means trying

with bigger buffers until it works. For instance  in

xlator/cluster/afr/src/afr-self-heald.c we have:

   while ((ret = syncop_readdir (subvol, fd, 131072, offset, &entries)))

We would use -1 instead of 131072 to tell that we want everything

without a size limit, and the server component (here features/index)

would either return everyting or fail, whithout playing with

telldir/seekdir.

Opinions? The third solution seems the best to me since it is not very

intrusive and it makes things simplier. Indeed we allow unbound data

size to come back from the brick to glustershd, but we trust the brick,

right?

--

Emmanuel Dreyfus

http://hcpnet.free.fr/pubz

manu@xxxxxxxxxx

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://supercolony.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel