Re: Regarding Replicated Volume

Ted Miller <tmiller@xxxxxxxx> · Thu, 20 Mar 2014 17:28:03 -0400



    On 3/19/2014 3:06 PM, Cary Tsai wrote:

    
      Hi There: 
        New to the GlusterFS and could not find the answer from the
          document
        so hope I can get the answer form the mailing list.
        

        Let's say we have two web servers:
        One in Seattle, WA and another one is in Chapel Hill, NC.
        So I create a 'replicated' volume which one brick in WA and
          another brick in NC.
        I assume the web server in both WA and NC can mount the
          'replicated' volume.
        There are 2 HTTP/Get calls from CA and NY.
        We assume CA's HTTP/Get is sent to web server in WA and
        NY's HTTP/Get is sent to web server in NC.
        

        My question is does the web server in WA definitely gets
          the data 
        from the brick in WA? If not, is any way to configure so
          the
        web server in WA definitely gets data from the brick in WA?
      
    
    The answer to your basic question is either "yes" or "they're
    working on it".  I know a while back it was on the "to do" list, but
    I am not sure if the patch is done, and if so, has it made it into
    production code.  But, the last I heard, yes, we were heading that
    direction.  Contrary to what someone else said, no, your scenario is
    not the only one where this is desirable.  In most any active
    situation using mostly-read files, reading from your local
    replicated disk is much faster, and also reduces network activity.

    
    The big make-or-break questions as to whether this will work for you
    are:

    * How much do you write? (more=more problem)

    * Are these files sort-of/almost WORM files (Write Once, Read Many).
    WORM is better, RAM is worse

    * Do both servers write?  (Only one is better)

    * Do you modify files?  (More modification = more headaches)

    * Do you replace/update files?  (Yes = more grief)

    
    The critical issue is timing.  Gluster has various operations where
    it has to communicate with all nodes, and the process cannot move
    forward until all nodes answer.  Gluster is designed for all nodes
    to be connected by 1GB or faster networking, so your
    cross-continental link is outside the use-case the developers are
    using.  This always applies to writes, i.e. when a write occurs, it
    has to finish on both servers, probably with several commands
    issued, and each time it cannot go on to the next step until the
    distant server finishes.  There are certain read operations where
    gluster checks to make sure that things match between all servers. 
    I hear reference to the "stat" call as being one that can be slow,
    but I can't say I fully understand what it does.  I think I
    understand that an 'ls' command does not include the 'stat' call,
    but the 'ls -l' does include the 'stat' call, so a 'ls -l' command
    on a directory with hundreds or thousands of files can take MUCH
    longer than an 'ls' call to that same directory.

    
    IF your web site is doing read-only access to your file system, and
    it is not triggering any calls that make gluster do a difference
    check between your two servers, it might work.

    
    If

    1. You do not require absolute real-time synchronization between the
    servers

    AND

    2. You can do all the writes on one of the two servers

    then

    you should probably look at Geo-replication.  Geo-replication is a
    one-way process, where all the changes happen on one end, and they
    are reflected on the other end.  It is designed to handle slower
    network links, and allows you to keep the two sites in close-to
    real-time synchronization.  How close to real time will depend on
    your server write load, and you would have to describe what you are
    doing and let some of the folks here give you their experience in
    similar situations.  At least you are within the intended use-case,
    so the developers will be receptive to any problems you have, and
    they may get fixed.

    
    Another caution (based on painfully learned experience).  If you
    decide to try a regular (not Geo-Replicated) system, I advise that
    you store your data on a third machine somewhere, ESPECIALLY if 
    both machines are updating files at the same time.  Otherwise, it
    seems that it is only a matter of time before you will be struggling
    with a split-brain situation.  When you face your first split-brain,
    you will wish you had never run into one.

    
    Ted Miller

    Elkhart, IN, USA

  
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users