Re: Feature requests of glusterfs

"Krishna Srinivas" <krishna@xxxxxxxxxxxxx> · Fri, 4 Jan 2008 00:46:40 +0530

--
1. add `local-volume-name' option to AFR, which can support read from
  the local volume (if available).
--

This feature is already there (in latest TLA repo) as "option read-subvolume"

--
2. nufa scheduler support more than one `local-volume'. Sometimes, more
  than one child of unify is local. In this case, it's valuable to set
  more than local-volume, and use them randomly, or in turn, or use the
  second after the first is full.
--

Correct... can you raise a feature/bug request (though the priority
will be less for this
as of now)

--
3. reduce the effect of network latency in afr. currently the afr write
  the data to the children in serial. So the speed is heavily affected
  by the latency of the network. How about add a new kind of afr, which
  is the combination of afr and io-threads. In the new xlator, each
  child is running in the separate thread, so the several send process
  is running at the same time. So the speed is affected by the network
  latency only one time (instead of several times).
--

Because write() call will be handled asynchronously, i.e when afr writes
to the child which is protocol/client, we dont wait for this write to complete
before calling write to the next child, so this is as good as what you
are saying (afr + iothreads for subvolumes) right? or am I missing something?

About your 4th point, from what I understand, this is what you want:

client AFR knows about 3 machines M1 M2 M3.
client AFR says write on M1. AFR on M1 writes to local disk and also
writes to M2.
Now AFR on M2 writes to local disk and also writes to M3.
This will be a complex setup. Also to get 100 mbps you will have to have
client-M1 and M1-M2 and M2-M3 on different networks. But if you are
ready to have this kind of network, we can achieve 100 mbps with present
combination of xlators that is already there now. i.e have server side AFRs on
M1 M2 M3. Client will connect to M1 (M1 will have WB above AFR), so writes
from the client can use full bandwidth. If M1 fails, client will connect to M2
by DNS round robin (some users are already using this kind of setup
on this list) so AFR on M2 will now write to M2 local disk and M3.

> > And more, there is a comment near the end of definition of
> > afr_sync_ownership_permission. This comment said that afr on afr wont
> > work. This function is triggered by afr_lookup_cbk when self_heal is
> > needed. And self_heal is very important for afr.
> >
> > Any one can help clear whether afr on afr has problem?
>
> Yes, thinking about it now, I an see at least one reason why it probably
> wouldn't work (afr extended attributes clash).  The devs expressed
> interest in chaining AFR before, so maybe it will become a reality in
> the future.

No actually clash of extended attributes does not cause problems.
It is just that it is not implemented (needs code changes thats it)
AFR over AFR used to work till directory selfheal was implemented.
So it will definitely be made to work in near future.

Please get back if there are any doubts or corrections.

Regards
Krishna

On Jan 3, 2008 11:41 PM, Kevan Benson <kbenson@xxxxxxxxxxxxxxx> wrote:
> LI Daobing wrote:
> > On Jan 3, 2008 3:09 AM, Kevan Benson <kbenson@xxxxxxxxxxxxxxx> wrote:
> >> LI Daobing wrote:
> > In your model, if a middle node out of work, then all the following
> > nodes out of work. (isn't it?) I think this is very dangerous for afr.
>
> Yes, I admitted that was a good key feature of your proposal.
>
> > And more, there is a comment near the end of definition of
> > afr_sync_ownership_permission. This comment said that afr on afr wont
> > work. This function is triggered by afr_lookup_cbk when self_heal is
> > needed. And self_heal is very important for afr.
> >
> > Any one can help clear whether afr on afr has problem?
>
> Yes, thinking about it now, I an see at least one reason why it probably
> wouldn't work (afr extended attributes clash).  The devs expressed
> interest in chaining AFR before, so maybe it will become a reality in
> the future.
>
> >> The only thing your translators provide that isn't already available
> >> through chained translators is automatic reconfiguration of the chain
> >> members when a server drops out, which is a good feature, but I would
> >> rather just add cheap redundant hardware to boost speed, such as extra
> >> gigabit NICs and switches to allow dedicated paths between select
> >> systems.  Also, maybe the new switch translator can be added to what's
> >> already available to achieve what you want, I'm still fuzzy on exactly
> >> what it can be used for.
> >
> > It's a good idea to buy more and better hardware. But it's better if
> > we can achive this by software. :)
>
> My argument wasn't so much hardware vs. software, but cheap effective
> hardware vs. complex software.  Fail-over can get tricky, especially
> when done because a node one or two steps removed from the originating
> request fails.  The more complex a fail-ever system is, the more I tend
> to distrust it.
>
> >>> PS, should I copy this feature request to wiki? Or it's ok to only put
> >>> it here?
> >> OK, now that I've done my best to tear down your proposal and say why
> >> it's not needed, here's where I put my disclaimer:
> >>
> >> 1) I'm not a dev, and I haven't really looked into the code, so I don't
> >> know how easy or hard your proposal is to actually implement.
> >> 2) I'm just one person, and even though *I* may think it's not needed,
> >> others may differ on this point.
> >>
> >
> > Thanks for your comment.
>
> Thanks for being good natured about the response.
>
> --
>
> -Kevan Benson
> -A-1 Networks
>
>
> _______________________________________________
>
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>