GlusterFS 3.0.2 small file read performance benchmark

lists at wildgooses.com (Ed W) · Tue, 02 Mar 2010 21:25:44 +0000

Well, "oplocks" are an SMB definition, but the basic concept of 
opportunistic locking is independent of the filesystem.  For example it 
appears that "oplocks" now appear in the NFS v4 standard under the name 
"delegations" (I would assume some variation of oplocks also exists in 
GFS and OCFS, but I'm not familiar with them)

The basic concept would potentially provide a huge performance boost for 
glusterfs because it allows cache coherent writeback caching.

In fact lets cut to the chase - what we desire is cache coherent 
writeback caching, ie reads to one server can be served from local 
client cache, but if the file is changed elsewhere then instantly our 
cache here is invalidated, and likewise we can write at will to a local 
copy of the file and allow it to get out of sync with the other servers, 
but as soon as some other server tries to read/write to our file then we 
must be notified and flush our cache (and request alternative locks or 
fall back to sync reads/writes)

How do we do this?  Well in NFS v3 and before and I believe in Glusterfs 
there is implemented only a "cache and hope" option, which caches data 
for a second or so and hopes the file doesn't change under us.  The 
improved algorithm is "opportunistic locking" where the client indicates 
to the server the desire to work with some data locally and get it out 
of sync with the server - the server then tracks that reservation and if 
some other client wants to access the data it pushes a lock break to the 
original client and informs it that it needs to fsync and run without 
the oplock

I believe that an oplock service this could be implemented via a new 
translator which works in conjunction with the read and writeback 
caching. Effectively it would be a two way lock manager, but it's job is 
somewhat simpler in that all it needs do is vary the existing caches on 
a per file basis.  So for example if we read some attributes for some 
files then at present they are blindly cached for X ms and then dropped, 
but our oplock translator will instead allow the attributes to be cached 
indefinitely until we get a push notification from the server side that 
our cache must be invalidated.  Same also with writes - we can use 
writeback cache as long as no one else has tried to read or write to our 
file, but as soon as someone else touches it we need to fsync and run 
without cache

I have had a very quick glance at the current locks module and it's 
quite a bit more complex than I might have guessed...  I had wondered if 
it might not be possible to make the locks module talk to the cache 
module and add server side lock breaking through that module?  
Essentially it's the addition of the "push" lock breaking which helps, 
so if we are reading away and some other client modifies a file then we 
need a feedback loop to invalide our read cache

Perhaps this is all implemented in glusterfs already though and I'm just 
missing the point...

Cheers

Ed W

On 02/03/2010 18:52, Tejas N. Bhise wrote:
> Ed,
>
> oplocks are implemented by SAMBA and it would not be a part of GlusterFS per se till we implement a native SAMBA translator ( something that would replace the SAMBA server itself with a thin SAMBA kind of a layer on top of GlusterFS itself ). We are doing that for NFS by building an NFS translator.
>
> At some point, it would be interesting to explore, clustered SAMBA using ctdb, where two GlusterFS clients can export the same volume. ctdb itself seems to be coming up well now.
>
> Regards,
> Tejas.
>
> ----- Original Message -----
> From: "Ed W"<lists at wildgooses.com>
> To: "Gluster Users"<gluster-users at gluster.org>
> Sent: Wednesday, March 3, 2010 12:10:47 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi
> Subject: Re: GlusterFS 3.0.2 small file	read	performance	benchmark
>
> On 01/03/2010 20:44, Ed W wrote:
>    
>> I believe samba (and probably others) use a two way lock escalation
>> facility to mitigate a similar problem.  So you can "read-lock" or
>> phrased differently, "express your interest in caching some
>> files/metadata" and then if someone changes what you are watching the
>> lock break is pushed to you to invalidate your cache.
>>      
> Seems NFS v4 implements something similar via "delegations" (not
> believed implemented in linux NFSv4 though...)
>
> In samba the equivalent are called "op locks"
>
> I guess this would be a great project for someone interested to work on
> - op-lock translator for gluster
>
> Ed W
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>