Re: A question of GlusterFS dentries!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> To: "Keiviw" <keiviw@xxxxxxx>
> Cc: gluster-devel@xxxxxxxxxxx, "gluster-users" <gluster-users@xxxxxxxxxxx>
> Sent: Wednesday, November 2, 2016 9:38:46 AM
> Subject: Re:  A question of GlusterFS dentries!
> 
> 
> 
> ----- Original Message -----
> > From: "Keiviw" <keiviw@xxxxxxx>
> > To: gluster-devel@xxxxxxxxxxx
> > Sent: Tuesday, November 1, 2016 12:41:02 PM
> > Subject:  A question of GlusterFS dentries!
> > 
> > Hi,
> > In GlusterFS distributed volumes, listing a non-empty directory was slow.
> > Then I read the dht codes and found the reasons. But I was confused that
> > GlusterFS dht travesed all the bricks(in the volume) sequentially,why not
> > use multi-thread to read dentries from multiple bricks simultaneously.
> > That's a question that's always puzzled me, Couly you please tell me
> > something about this???
> 
> readdir across subvols is sequential mostly because we have to support
> rewinddir(3). We need to maintain the mapping of offset and dentry across
> multiple invocations of readdir. In other words if someone did a rewinddir
> to an offset corresponding to earlier dentry, subsequent readdirs should
> return same set of dentries what the earlier invocation of readdir returned.
> For example, in an hypothetical scenario, readdir returned following
> dentries:
> 
> 1. a, off=10
> 2. b, off=2
> 3. c, off=5
> 4. d, off=15
> 5. e, off=17
> 6. f, off=13
> 
> Now if we did rewinddir to off 5 and issue readdir again we should get
> following dentries:
> (c, off=5), (d, off=15), (e, off=17), (f, off=13)
> 
> Within a subvol backend filesystem provides rewinddir guarantee for the
> dentries present on that subvol. However, across subvols it is the
> responsibility of DHT to provide the above guarantee. Which means we
> should've some well defined order in which we send readdir calls (Note that
> order is not well defined if we do a parallel readdir across all subvols).
> So, DHT has sequential readdir which is a well defined order of reading
> dentries.
> 
> To give an example if we have another subvol - subvol2 - (in addiction to the

s/addiction/addition/

> subvol above - say subvol1) with following listing:
> 1. g, off=16
> 2. h, off=20
> 3. i, off=3
> 4. j, off=19
> 
> With parallel readdir we can have many ordering like - (a, b, g, h, i, c, d,
> e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with readdir
> done parallely):
> 
> 1. A complete listing of the directory (which can be any one of 10P1 = 10

I think it is 10P10 = 3628800. But again it is not completely random selection as readdir on a single subvol still gives one ordering, so the value is much less. The point here is that there can be many possible listings with parallel readdir.

> ways - I hope math is correct here).
> 2. Do rewinddir (20)
> 
> We cannot predict what are the set of dentries that come _after_ offset 20.
> However, if we do a readdir sequentially across subvols there is only one
> directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier to
> support rewinddir.
> 
> If there is no POSIX requirement for rewinddir support, I think a parallel
> readdir can easily be implemented (which improves performance too). But
> unfortunately rewinddir is still a POSIX requirement. This also opens up
> another possibility of a "no-rewinddir-support" option in DHT, which if
> enabled results in parallel readdirs across subvols. What I am not sure is
> how many users still use rewinddir? If there is a critical mass which wants
> performance with a tradeoff of no rewinddir support this can be a good
> feature.
> 
> +gluster-users to get an opinion on this.
> 
> regards,
> Raghavendra
> 
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux