----- Original Message ----- > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > To: "Keiviw" <keiviw@xxxxxxx> > Cc: gluster-devel@xxxxxxxxxxx, "gluster-users" <gluster-users@xxxxxxxxxxx> > Sent: Wednesday, November 2, 2016 9:38:46 AM > Subject: Re: A question of GlusterFS dentries! > > > > ----- Original Message ----- > > From: "Keiviw" <keiviw@xxxxxxx> > > To: gluster-devel@xxxxxxxxxxx > > Sent: Tuesday, November 1, 2016 12:41:02 PM > > Subject: A question of GlusterFS dentries! > > > > Hi, > > In GlusterFS distributed volumes, listing a non-empty directory was slow. > > Then I read the dht codes and found the reasons. But I was confused that > > GlusterFS dht travesed all the bricks(in the volume) sequentially,why not > > use multi-thread to read dentries from multiple bricks simultaneously. > > That's a question that's always puzzled me, Couly you please tell me > > something about this??? > > readdir across subvols is sequential mostly because we have to support > rewinddir(3). We need to maintain the mapping of offset and dentry across > multiple invocations of readdir. In other words if someone did a rewinddir > to an offset corresponding to earlier dentry, subsequent readdirs should > return same set of dentries what the earlier invocation of readdir returned. > For example, in an hypothetical scenario, readdir returned following > dentries: > > 1. a, off=10 > 2. b, off=2 > 3. c, off=5 > 4. d, off=15 > 5. e, off=17 > 6. f, off=13 > > Now if we did rewinddir to off 5 and issue readdir again we should get > following dentries: > (c, off=5), (d, off=15), (e, off=17), (f, off=13) > > Within a subvol backend filesystem provides rewinddir guarantee for the > dentries present on that subvol. However, across subvols it is the > responsibility of DHT to provide the above guarantee. Which means we > should've some well defined order in which we send readdir calls (Note that > order is not well defined if we do a parallel readdir across all subvols). > So, DHT has sequential readdir which is a well defined order of reading > dentries. > > To give an example if we have another subvol - subvol2 - (in addiction to the s/addiction/addition/ > subvol above - say subvol1) with following listing: > 1. g, off=16 > 2. h, off=20 > 3. i, off=3 > 4. j, off=19 > > With parallel readdir we can have many ordering like - (a, b, g, h, i, c, d, > e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with readdir > done parallely): > > 1. A complete listing of the directory (which can be any one of 10P1 = 10 I think it is 10P10 = 3628800. But again it is not completely random selection as readdir on a single subvol still gives one ordering, so the value is much less. The point here is that there can be many possible listings with parallel readdir. > ways - I hope math is correct here). > 2. Do rewinddir (20) > > We cannot predict what are the set of dentries that come _after_ offset 20. > However, if we do a readdir sequentially across subvols there is only one > directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier to > support rewinddir. > > If there is no POSIX requirement for rewinddir support, I think a parallel > readdir can easily be implemented (which improves performance too). But > unfortunately rewinddir is still a POSIX requirement. This also opens up > another possibility of a "no-rewinddir-support" option in DHT, which if > enabled results in parallel readdirs across subvols. What I am not sure is > how many users still use rewinddir? If there is a critical mass which wants > performance with a tradeoff of no rewinddir support this can be a good > feature. > > +gluster-users to get an opinion on this. > > regards, > Raghavendra > > > > > > > > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxxx > > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel