Re: regarding GF_CONTENT_KEY and dht2 - perf with small files

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Wed, 03 Feb 2016 15:42:17 +0530

The file data would be located based on its GFID, so before the *first*
lookup/stat for a file, there is no way to know it's GFID.
NOTE: Instead of a name hash the GFID hash is used, to get immunity
against renames and the like, as a name hash could change the location
information for the file (among other reasons).

Another manner of achieving the same when the GFID of the file is 
known (from a readdir) is to wind the lookup and read of size to the 
respective MDS and DS, where the lookup would be responded to once the 
MDS responds, and the DS response is cached for the subsequent 
open+read case. So on the wire we would have a fan out of 2 FOPs, but 
still satisfy the quick read requirements.

Tar kind of workload doesn't have a problem because we know the gfid 
after readdirp.

I would assume the above resolves the problem posted, are there cases 
where we do not know the GFID of the file? i.e no readdir performed 
and client knows the file name that it wants to operate on? Do we have 
traces of the webserver workload to see if it generates names on the 
fly or does a readdir prior to that?

Problem is with workloads which know the files that need to be read 
without readdir, like hyperlinks (webserver), swift objects etc. These 
are two I know of which will have this problem, which can't be improved 
because we don't have metadata, data co-located. I have been trying to 
think of a solution for past few days. Nothing good is coming up :-/

Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel