The file data would be located based on its GFID, so before the *first*
lookup/stat for a file, there is no way to know it's GFID.
NOTE: Instead of a name hash the GFID hash is used, to get immunity
against renames and the like, as a name hash could change the location
information for the file (among other reasons).
Another manner of achieving the same when the GFID of the file is
known (from a readdir) is to wind the lookup and read of size to the
respective MDS and DS, where the lookup would be responded to once the
MDS responds, and the DS response is cached for the subsequent
open+read case. So on the wire we would have a fan out of 2 FOPs, but
still satisfy the quick read requirements.
Tar kind of workload doesn't have a problem because we know the gfid
after readdirp.
I would assume the above resolves the problem posted, are there cases
where we do not know the GFID of the file? i.e no readdir performed
and client knows the file name that it wants to operate on? Do we have
traces of the webserver workload to see if it generates names on the
fly or does a readdir prior to that?
Problem is with workloads which know the files that need to be read
without readdir, like hyperlinks (webserver), swift objects etc. These
are two I know of which will have this problem, which can't be improved
because we don't have metadata, data co-located. I have been trying to
think of a solution for past few days. Nothing good is coming up :-/
Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel