Re: Feature support: development of metadata service xlator plugin in glusterfs.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I would like to present this problem in a different context, than solving this with a meta data server approach, and also state some of the ongoing efforts, and wish list items, to address problems of this nature (hence the top post).

Comments welcome! (there is a lot of "hand waving" below BTW :) )

The problem you discuss, is with regards to fan out calls rather than actual calls themselves. I.e if we still do, lock (N+M)->record file length(N+M)->write->unlock(N+M) using a meta data server in place, this would only reduce to just removing the fan out portions of the same, i.e lock->record file length->write->unlock.

(I am not an expert at EC, so I assume the sequence is right, at least lock->unlock is present in such forms in EC and AFR, so the discussion could proceed with these in mind).

A) Fan out, slows down responses to the slowest response, but if we could remove some steps in the process (say the entire lock->unlock) then we would be better placed to speed up the stack. One of the possibilities to do this is using delegation support in the Gluster stack that has been added for NFS Ganesha.

With piggy backed auto delegation support for a file open/creat/lookup to a gluster client stack, the locks are local to the client and hence do not involve network round trips for the same. Some parts of this are in [1] and some discussed in [2].

B) For the fan out itself, NSR (see [3]) proposes server side leader election, which could be the meta data server equivalent, but nevertheless distributed for each EC/AFR set. Thereby removing any single MDS limitations, and distributing MDS loads as well.

In this scheme of things, the leader needs to do local locks, rather than the client having to send in lock requests, thereby reducing possible FOPs again. Also, the leader can record file length etc. and failed transactions can be handled better, further possibly reducing other network FOP/call.

If at all possible, with A+B we should be able to come to a point of 1:1 call count between, FOP by the client, to a network FOP to a brick (and in some occasion a fan out of 1:k). Which would mean equivalence for the most part to any existing network file system that is not distributed (e.g NFS).

C) For DHT related issues in the fan out of directory operations, work around the same is being discussed as DHT2 here [4].

The central theme for DHT2 being, directory in one subvolume, hence eliminate fan out and also bring in better consistency to various FOPs that DHT performs.

Overall, with these approaches, we (as in gluster) would/should aim for better consistency (first), with improved network utilization and reduced round trips to improve performance (next).

Foot note: None of this is ready yet, and would take time, this is just a *possible* direction that gluster core is going ahead with to address various problems at scale.

Shyam

[1] http://www.gluster.org/community/documentation/index.php/Features/caching
[2] http://www.gluster.org/pipermail/gluster-devel/2014-February/039900.html
[3] http://www.gluster.org/community/documentation/index.php/Features/new-style-replication [4] http://www.gluster.org/community/documentation/index.php/Features/dht-scalability

On 06/21/2015 08:33 AM, 张兵 wrote:
Thank you for your reply.
In glusterfs,Some metadata information is recorded in the file's
extended attr in all
bricks,
For example EC volume, N+M mode, stat file requires N+M command,
file write, requires M+M lock, record file length, and also N+M setattr
command,
Finally n+m unlock command;
if have metadataserver,All metadata related operations
only one command to metadata server;
As the old topic, MKDIR requires that all the DHT children should be
executed Mkdir;
Another difficult problem, lack of centralized metadata; disk recovery
performance is not able to get a massive upgrade;Such as EC N+M volume,
disk reconstruction, and only bricks n+m to participate in the
reconstruction;Rebuilding 1TB takes several hours;
The use of metadata, the data can be dispersed to all the disk,Disk
failure, a lot of disk can be involved in the reconstruction;
How to solve these difficulties.

At 2015-06-21 05:31:58, "Vijay Bellur" <vbellur@xxxxxxxxxx> wrote:
On Friday 19 June 2015 10:43 PM, 张兵 wrote:
Hi all
     In the use of the glusterfs ,found file system commands a lot, such
as stat, lookup,setfattr, the very influence system performance,
especially with EC volume. The use of glusterfs code architecture and
add metadata server xlater and achieve similar GFS architecture; so, the
same set of software, users can choose their own metadata server or not
to choose the metadata server;

How do you expect the metadata server to aide performance here? There
would be network trips to the metadata servers to set/fetch necessary
information. If the intention is to avoid the penalty of having to fetch
information from disk, we have been investigating the possibility of
loading md-cache as part of the brick process graph to avoid hitting the
disk for repetitive fetch of attributes & extended attributes. I expect
that to be mainlined soon.

If you have other ideas on how a metadata server can improve
performance, that would be interesting to know.

Regards,
Vijay







_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux