Re: SOLVED - Re: replicate background threads

"Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx> · Thu, 05 Apr 2012 13:51:51 +1000



Thanks


----- Original Message -----
>From: "Anand Avati" <anand.avati@xxxxxxxxx>
>To: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
>Subject:  Re: SOLVED - Re: replicate
background threads
>Date: Wed, 04 Apr 2012 12:30:13 -0700
>
> Thanks for the report Ian. I have filed a bug report
> https://bugzilla.redhat.com/show_bug.cgi?id=809982
> 
> On Wed, Apr 4, 2012 at 4:57 AM, Ian Latter
<ian.latter@xxxxxxxxxxxxxxxx>wrote:
> 
> >
> >
> > Sorry;
> >
> >  That "long (unsigned 32bit)" should have been
> > "long (signed 32bit)" ... so that's twice that bug has
> > bitten ;-)
> >
> >
> > Cheers,
> >
> >
> > ----- Original Message -----
> > >From: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
> > >To: "Pranith Kumar K" <pranithk@xxxxxxxxxxx>
> > >Subject:  SOLVED - Re:  replicate
> > background threads
> > >Date: Wed, 04 Apr 2012 21:51:11 +1000
> > >
> > > Hello,
> > >
> > >
> > >   Michael and I ran a battery of testing today and
> > > closed out the two issues identified below (of March
> > > 11).
> > >
> > >
> > > FYI RE the "background-self-heal-only" patch;
> > >
> > >   It has been tested now to our satisfaction and
> > >   works as described/intended.
> > >
> > >
> > >
> >
> >
http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-background-only.patch
> > >
> > >
> > >
> > > FYI RE the 2GB replicate error;
> > >
> > >   >>>    2) Of the file that were replicated, not all were
> > >   >>>          corrupted (capped at 2G -- note that we
> > >   >>>          confirmed that this was the first 2G of the
> > >   >>>          source file contents).
> > >   >>>
> > >   >>> So is there a known replicate issue with files
> > >   >>> greater than 2GB?
> > >
> > >   We have confirmed this issue and the referenced
> > >   patch appears to correct the problem.  We were
> > >   able to get one particular file to reliably fail at 2GB
> > >   under GlusterFS 3.2.6, and then correctly
> > >   transfer it and many other >2GB files, after
> > >   applying this patch.
> > >
> > >   The error stems from putting the off_t (64bit)
> > >   offset value into a void * cookie value typecast
> > >   as long (unsigned 32bit) and then restoring it into
> > >   an off_t again.  The tip-off was a recurring offset
> > >   of 18446744071562067968 seen in the logs. The
> > >   effect is described well here;
> > >
> > >
> >
> >
http://stackoverflow.com/questions/5628484/unexpected-behavior-from-unsigned-int64
> > >
> > >   We can't explain why this issue was intermittent,
> > >   and we're not sure if the rw_sh->offset is the
> > >   correct 64bit offset to use.  However that offset
> > >   appeared to match the cookie value in all tested
> > >   pre-failure states.  Please advise if there is a
> > >   better (more correct) off_t offset to use.
> > >
> > >
> > >
> >
http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-2GB.patch
> > >
> > >
> > >
> > > Thanks for your help,
> > >
> > >
> > >
> > >
> > > ----- Original Message -----
> > > >From: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
> > > >To: "Pranith Kumar K" <pranithk@xxxxxxxxxxx>
> > > >Subject:  Re: replicate background
threads
> > > >Date: Tue, 03 Apr 2012 20:41:48 +1000
> > > >
> > > >
> > > > Pizza reveals all ;-)
> > > >
> > > > There's an error in there with the LOCK going
> > > > without a paired UNLOCK in the afr-common
> > > > test.  Revised (untested) patch attached.
> > > >
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > >From: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
> > > > >To: "Pranith Kumar K" <pranithk@xxxxxxxxxxx>
> > > > >Subject:  Re: replicate background
threads
> > > > >Date: Tue, 03 Apr 2012 19:46:51 +1000
> > > > >
> > > > >
> > > > > FYI - untested patch attached.
> > > > >
> > > > >
> > > > >
> > > > > ----- Original Message -----
> > > > > >From: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
> > > > > >To: "Pranith Kumar K" <pranithk@xxxxxxxxxxx>
> > > > > >Subject:  Re: replicate background
> > threads
> > > > > >Date: Tue, 03 Apr 2012 18:50:11 +1000
> > > > > >
> > > > > >
> > > > > > FYI - I can see that this option doesn't exist, I'm
> > > > adding it
> > > > > > now.
> > > > > >
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > >From: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
> > > > > > >To: "Pranith Kumar K" <pranithk@xxxxxxxxxxx>
> > > > > > >Subject:  Re: replicate background
> > > threads
> > > > > > >Date: Mon, 02 Apr 2012 18:02:26 +1000
> > > > > > >
> > > > > > >
> > > > > > > Hello Pranith,
> > > > > > >
> > > > > > >
> > > > > > >   Michael has come back from his business trip and
> > > > > > > we're about to start testing again (though now
under
> > > > > > > kernel 3.2.13 and GlusterFS 3.2.6).
> > > > > > >
> > > > > > >   I've published the 32bit (i586) client on the
> > Saturn
> > > > > > > project site if anyone is chasing it;
> > > > > > >   http://midnightcode.org/projects/saturn/
> > > > > > >
> > > > > > >   One quick question, is there a tune-able
parameter
> > > > > > > that will allow a stat to be non blocking (i.e. to
> > stop
> > > > > > > self-heal going foreground) when the background
> > > > > > > self heal count is reached?
> > > > > > >   I.e. rather than having the stat hang for 2 days
> > > > > > > while the files are replicated, we'd rather it
fell
> > > > > > > through and allowed subsequent stats to attempt
> > > > > > > background self healing (perhaps at a time when
> > > > > > > background self heal slots are available).
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > >From: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
> > > > > > > >To: "Pranith Kumar K" <pranithk@xxxxxxxxxxx>
> > > > > > > >Subject:  Re: replicate
background
> > > > threads
> > > > > > > >Date: Wed, 14 Mar 2012 19:36:24 +1000
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > > hi Ian,
> > > > > > > > >      Maintaining a queue of files that
need to be
> > > > > > > > > self-healed does not scale in practice, in
cases
> > > > > > > > > where there are millions of files that
need self-
> > > > > > > > > heal. So such a thing is not implemented. The
> > > > > > > > > idea is to make self-heal foreground after a
> > > > > > > > > certain-limit (background-self-heal-count) so
> > > > > > > > > there is no necessity for such a queue.
> > > > > > > > >
> > > > > > > > > Pranith.
> > > > > > > >
> > > > > > > > Ok, I understand - it will be interesting to
observe
> > > > > > > > the system with the new knowledge from your
> > > > > > > > messages - thanks for your help, appreciate it.
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > > ----- Original Message -----
> > > > > > > > >From: "Pranith Kumar K" <pranithk@xxxxxxxxxxx>
> > > > > > > > >To: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
> > > > > > > > >Subject:  Re: replicate
background
> > > > > threads
> > > > > > > > >Date: Wed, 14 Mar 2012 07:33:32 +0530
> > > > > > > > >
> > > > > > > > > On 03/14/2012 01:47 AM, Ian Latter wrote:
> > > > > > > > > > Thanks for the info Pranith;
> > > > > > > > > >
> > > > > > > > > > <pranithk>  the option to increase the max
> > num of
> > > > > > > background
> > > > > > > > > > self-heals
> > > > > > > > > > is cluster.background-self-heal-count.
Default
> > > > > value of
> > > > > > > > > > which is 16. I
> > > > > > > > > > assume you know what you are doing to the
> > > > performance
> > > > > > > of the
> > > > > > > > > > system by
> > > > > > > > > > increasing this number.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I didn't know this.  Is there a queue
length for
> > > > what
> > > > > > > > > > is yet to be handled by the background
self heal
> > > > > > > > > > count?  If so, can it also be adjusted?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ----- Original Message -----
> > > > > > > > > >> From: "Pranith Kumar
K"<pranithk@xxxxxxxxxxx>
> > > > > > > > > >> To: "Ian
Latter"<ian.latter@xxxxxxxxxxxxxxxx>
> > > > > > > > > >> Subject:  Re: replicate
> > > background
> > > > > > > threads
> > > > > > > > > >> Date: Tue, 13 Mar 2012 21:07:53 +0530
> > > > > > > > > >>
> > > > > > > > > >> On 03/13/2012 07:52 PM, Ian Latter wrote:
> > > > > > > > > >>> Hello,
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>     Well we've been privy to our first
true
> > > > error in
> > > > > > > > > >>> Gluster now, and we're not sure of the
cause.
> > > > > > > > > >>>
> > > > > > > > > >>>     The SaturnI machine with 1Gbyte of
RAM was
> > > > > > > > > >>> exhausting its memory and crashing and
we saw
> > > > > > > > > >>> core dumps on SaturnM and MMC.  Replacing
> > > > > > > > > >>> the SaturnI hardware with identical
> > hardware to
> > > > > > > > > >>> SaturnM, but retaining SaturnI's original
> > disks,
> > > > > > > > > >>> (so fixing the memory capacity
problem) we saw
> > > > > > > > > >>> crashes randomly at all nodes.
> > > > > > > > > >>>
> > > > > > > > > >>>     Looking for irregularities at the file
> > > system
> > > > > > > > > >>> we noticed that (we'd estimate) about
60% of
> > > > > > > > > >>> the files at the OS/EXT3 layer of SaturnI
> > > > > > > > > >>> (sourced via replicate from SaturnM)
were of
> > > > > > > > > >>> size 2147483648 (2^31) where they should
> > > > > > > > > >>> have been substantially larger.  While we
> > would
> > > > > > > > > >>> happily accept "you shouldn't be using
a 32bit
> > > > > > > > > >>> gluster package" as the answer, we
note two
> > > > > > > > > >>> deltas;
> > > > > > > > > >>>     1) All files used in testing were
copied
> > > > on from
> > > > > > > > > >>>          32 bit clients to 32 bit servers,
> > > with no
> > > > > > > > > >>>          observable errors
> > > > > > > > > >>>     2) Of the file that were replicated,
> > not all
> > > > > were
> > > > > > > > > >>>          corrupted (capped at 2G -- note
> > that we
> > > > > > > > > >>>          confirmed that this was the
first 2G
> > > > of the
> > > > > > > > > >>>          source file contents).
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> So is there a known replicate issue
with files
> > > > > > > > > >>> greater than 2GB?  Has anyone done any
> > > > > > > > > >>> serious testing with significant
numbers of
> > > files
> > > > > > > > > >>> of this size?  Are there configurations
> > specific
> > > > > > > > > >>> to files/structures of these dimensions?
> > > > > > > > > >>>
> > > > > > > > > >>> We noted that reversing the configuration,
> > such
> > > > > > > > > >>> that SaturnI provides the replicate Brick
> > > amongst
> > > > > > > > > >>> a local distribute and a remote map to
SaturnM
> > > > > > > > > >>> where SaturnM simply serves a local
> > distribute;
> > > > > > > > > >>> that the data served to MMC is
accurate (it
> > > > > > > > > >>> continues to show 15GB files, even
where there
> > > > > > > > > >>> is a local 2GB copy).  Further, a client
> > "cp" at
> > > > > > > > > >>> MMC, of a file with a 2GB local
replicate of a
> > > > > > > > > >>> 15GB file, will result in a 15GB file
being
> > > > > > > > > >>> created and replicated via Gluster
(i.e. the
> > > > > > > > > >>> correct specification at both server
nodes).
> > > > > > > > > >>>
> > > > > > > > > >>> So my other question is; Is it
possible that
> > > we've
> > > > > > > > > >>> managed to corrupt something in this
> > > > > > > > > >>> environment?  I.e. during the initial
memory
> > > > > > > > > >>> exhaustion events?  And is there a
robust way
> > > > > > > > > >>> to have the replicate files revalidated by
> > > gluster
> > > > > > > > > >>> as a stat doesn't seem to be correcting
> > files in
> > > > > > > > > >>> this state (i.e. replicate on SaturnM
> > results in
> > > > > > > > > >>> daemon crashes, replicate on SaturnI
results
> > > > > > > > > >>> in files being left in the bad state).
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> Also, I'm not a member of the users
list; if
> > > these
> > > > > > > > > >>> questions are better posed there then
let me
> > > > > > > > > >>> know and I'll re-post them there.
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks,
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> ----- Original Message -----
> > > > > > > > > >>>> From: "Ian
> > Latter"<ian.latter@xxxxxxxxxxxxxxxx>
> > > > > > > > > >>>> To:<gluster-devel@xxxxxxxxxx>
> > > > > > > > > >>>> Subject:  replicate
> > background
> > > > > > threads
> > > > > > > > > >>>> Date: Sun, 11 Mar 2012 20:17:15 +1000
> > > > > > > > > >>>>
> > > > > > > > > >>>> Hello,
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>     My mate Michael and I have been
steadily
> > > > > > > > > >>>> advancing our Gluster testing and
today we
> > > > finally
> > > > > > > > > >>>> reached some heavier conditions.  The
outcome
> > > > > > > > > >>>> was different from expectations built
from
> > > > our more
> > > > > > > > > >>>> basic testing so I think we have a
couple of
> > > > > > > > > >>>> questions regarding the AFR/Replicate
> > > background
> > > > > > > > > >>>> threads that may need a developer's
> > > contribution.
> > > > > > > > > >>>> Any help appreciated.
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>     The setup is a 3 box environment, but
> > lets
> > > > > start
> > > > > > > > > >>>> with two;
> > > > > > > > > >>>>
> > > > > > > > > >>>>       SaturnM (Server)
> > > > > > > > > >>>>          - 6core CPU, 16GB RAM, 1Gbps net
> > > > > > > > > >>>>          - 3.2.6 Kernel (custom distro)
> > > > > > > > > >>>>          - 3.2.5 Gluster (32bit)
> > > > > > > > > >>>>          - 3x2TB drives, CFQ, EXT3
> > > > > > > > > >>>>          - Bricked up into a single
local 6TB
> > > > > > > > > >>>>             "distribute" brick
> > > > > > > > > >>>>          - "brick" served to the network
> > > > > > > > > >>>>
> > > > > > > > > >>>>       MMC (Client)
> > > > > > > > > >>>>          - 4core CPU, 8GB RAM, 1Gbps net
> > > > > > > > > >>>>          - Ubuntu
> > > > > > > > > >>>>          - 3.2.5 Gluster (32bit)
> > > > > > > > > >>>>          - Don't recall the disk space
> > locally
> > > > > > > > > >>>>          - "brick" from SaturnM mounted
> > > > > > > > > >>>>
> > > > > > > > > >>>>       500 x 15Gbyte files were copied
> > from MMC
> > > > > > > > > >>>> to a single sub-directory on the
brick served
> > > > from
> > > > > > > > > >>>> SaturnM, all went well and dandy.  So
then we
> > > > > > > > > >>>> moved on to a 3 box environment;
> > > > > > > > > >>>>
> > > > > > > > > >>>>       SaturnI (Server)
> > > > > > > > > >>>>          = 1core CPU, 1GB RAM, 1Gbps net
> > > > > > > > > >>>>          = 3.2.6 Kernel (custom distro)
> > > > > > > > > >>>>          = 3.2.5 Gluster (32bit)
> > > > > > > > > >>>>          = 4x2TB drives, CFQ, EXT3
> > > > > > > > > >>>>          = Bricked up into a single
local 8TB
> > > > > > > > > >>>>             "distribute" brick
> > > > > > > > > >>>>          = "brick" served to the network
> > > > > > > > > >>>>
> > > > > > > > > >>>>       SaturnM (Server/Client)
> > > > > > > > > >>>>          - 6core CPU, 16GB RAM, 1Gbps net
> > > > > > > > > >>>>          - 3.2.6 Kernel (custom distro)
> > > > > > > > > >>>>          - 3.2.5 Gluster (32bit)
> > > > > > > > > >>>>          - 3x2TB drives, CFQ, EXT3
> > > > > > > > > >>>>          - Bricked up into a single
local 6TB
> > > > > > > > > >>>>             "distribute" brick
> > > > > > > > > >>>>          = Replicate brick added to
sit over
> > > > > > > > > >>>>             the local distribute
brick and a
> > > > > > > > > >>>>             client "brick" mapped from
> > SaturnI
> > > > > > > > > >>>>          - Replicate "brick" served
to the
> > > > network
> > > > > > > > > >>>>
> > > > > > > > > >>>>       MMC (Client)
> > > > > > > > > >>>>          - 4core CPU, 8GB RAM, 1Gbps net
> > > > > > > > > >>>>          - Ubuntu
> > > > > > > > > >>>>          - 3.2.5 Gluster (32bit)
> > > > > > > > > >>>>          - Don't recall the disk space
> > locally
> > > > > > > > > >>>>          - "brick" from SaturnM mounted
> > > > > > > > > >>>>          = "brick" from SaturnI mounted
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>     Now, in lesser testing in this
scenario
> > > > all was
> > > > > > > > > >>>> well - any files on SaturnI would
appear on
> > > > SaturnM
> > > > > > > > > >>>> (not a functional part of our test)
and the
> > > > > > content on
> > > > > > > > > >>>> SaturnM would appear on SaturnI (the real
> > > > > > > > > >>>> objective).
> > > > > > > > > >>>>
> > > > > > > > > >>>>     Earlier testing used a handful of
smaller
> > > > files
> > > > > > > (10s
> > > > > > > > > >>>> to 100s of Mbytes) and a single
15Gbyte file.
> > > >  The
> > > > > > > > > >>>> 15Gbyte file would be "stat" via an "ls",
> > which
> > > > > would
> > > > > > > > > >>>> kick off a background replication (ls
> > > > appeared un-
> > > > > > > > > >>>> blocked) and the file would be
transferred.
> > > > Also,
> > > > > > > > > >>>> interrupting the transfer (pulling
the LAN
> > > cable)
> > > > > > > > > >>>> would result in a partial 15Gbyte
file being
> > > > > > corrected
> > > > > > > > > >>>> in a subsequent background process on
another
> > > > > > > > > >>>> stat.
> > > > > > > > > >>>>
> > > > > > > > > >>>>     *However* .. when confronted with
500 x
> > > > 15Gbyte
> > > > > > > > > >>>> files, in a single directory (but not the
> > root
> > > > > > > directory)
> > > > > > > > > >>>> things don't quite work out as nicely.
> > > > > > > > > >>>>     - First, the "ls" (at MMC against the
> > > SaturnM
> > > > > > > brick)
> > > > > > > > > >>>>       is blocking and hangs the terminal
> > > (ctrl-c
> > > > > > > doesn't
> > > > > > > > > >>>>       kill it).
> > > > > > > > > >> <pranithk>  At max 16 files can be
self-healed
> > > > in the
> > > > > > > > > > back-ground in
> > > > > > > > > >> parallel. 17th file self-heal will happen
> > in the
> > > > > > > > foreground.
> > > > > > > > > >>>>     - Then, looking from MMC at the
SaturnI
> > > file
> > > > > > > > > >>>>        system (ls -s) once per second,
> > and then
> > > > > > > > > >>>>        comparing the output (diff ls1.txt
> > > > ls2.txt |
> > > > > > > > > >>>>        grep -v '>') we can see that
> > between 10
> > > > > and 17
> > > > > > > > > >>>>        files are being updated
simultaneously
> > > > > by the
> > > > > > > > > >>>>        background process
> > > > > > > > > >> <pranithk>  This is expected.
> > > > > > > > > >>>>     - Further, a request at MMC for a
> > > single file
> > > > > > that
> > > > > > > > > >>>>       was originally in the 500 x 15Gbyte
> > > > > sub-dir on
> > > > > > > > > >>>>       SaturnM (which should return
> > > unblocked with
> > > > > > > > > >>>>       correct results) will;
> > > > > > > > > >>>>         a) work as expected if there
are less
> > > > > than 17
> > > > > > > > > >>>>             active background file tasks
> > > > > > > > > >>>>         b) block/hang if there are more
> > > > > > > > > >>>>     - Where-as a stat (ls) outside of
the 500
> > > > x 15
> > > > > > > > > >>>>        sub-directory, such as the root of
> > that
> > > > > brick,
> > > > > > > > > >>>>        would always work as expected
(return
> > > > > > > > > >>>>        immediately, unblocked).
> > > > > > > > > >> <pranithk>  stat on the directory will only
> > > > > create the
> > > > > > > > > > files with '0'
> > > > > > > > > >> file size. Then when you ls/stat the actual
> > > > file the
> > > > > > > > > > self-heal for the
> > > > > > > > > >> file gets triggered.
> > > > > > > > > >>>>
> > > > > > > > > >>>>     Thus, to us, it appears as though
there
> > > is a
> > > > > > > > > >>>> queue feeding a set of (around) 16 worker
> > > threads
> > > > > > > > > >>>> in AFR.  If your request was to the
loaded
> > > > > directory
> > > > > > > > > >>>> then you would be blocked until a
worker was
> > > > > > > > > >>>> available, and if your request was to any
> > other
> > > > > > > > > >>>> location, it would return unblocked
> > > regardless of
> > > > > > > > > >>>> the worker pool state.
> > > > > > > > > >>>>
> > > > > > > > > >>>>     The only thread metric that we could
> > > find to
> > > > > > tweak
> > > > > > > > > >>>> was performance/io-threads (which was
set to
> > > > > > > > > >>>> 16 per physical disk; well per locks
+ posix
> > > > brick
> > > > > > > > > >>>> stacks) but increasing this to 64 per
stack
> > > > didn't
> > > > > > > > > >>>> change the outcome (10 to 17 active
> > background
> > > > > > > > > >>>> transfers).
> > > > > > > > > >> <pranithk>  the option to increase the max
> > num of
> > > > > > > > > > background self-heals
> > > > > > > > > >> is cluster.background-self-heal-count.
Default
> > > > > value of
> > > > > > > > > > which is 16. I
> > > > > > > > > >> assume you know what you are doing to the
> > > > > > performance of
> > > > > > > > > > the system by
> > > > > > > > > >> increasing this number.
> > > > > > > > > >>>>
> > > > > > > > > >>>>     So, given the above, is our analysis
> > > > sound, and
> > > > > > > > > >>>> if so, is there a way to increase the
size
> > > of the
> > > > > > > > > >>>> pool of active worker threads?  The
objective
> > > > > > > > > >>>> being to allow unblocked access to an
> > existing
> > > > > > > > > >>>> repository of files (on SaturnM) while a
> > > > > > > > > >>>> secondary/back-up is being filled, via
> > > GlusterFS?
> > > > > > > > > >>>>
> > > > > > > > > >>>>     Note that I understand that
performance
> > > > > > > > > >>>> (through-put) will be an issue in the
> > described
> > > > > > > > > >>>> environment: this replication process is
> > > > > > > > > >>>> estimated to run for between 10 and
40 hours,
> > > > > > > > > >>>> which is acceptable so long as it isn't
> > > blocking
> > > > > > > > > >>>> (there's a production-capable file set in
> > > place).
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>> Any help appreciated.
> > > > > > > > > >>>>
> > > > > > > > > >> Please let us know how it goes.
> > > > > > > > > >>>> Thanks,
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > > > > > > > > >>>> --
> > > > > > > > > >>>> Ian Latter
> > > > > > > > > >>>> Late night coder ..
> > > > > > > > > >>>> http://midnightcode.org/
> > > > > > > > > >>>>
> > > > > > > > > >>>>
> > _______________________________________________
> > > > > > > > > >>>> Gluster-devel mailing list
> > > > > > > > > >>>> Gluster-devel@xxxxxxxxxx
> > > > > > > > > >>>>
> > > > > >
https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > > > > > > >>>>
> > > > > > > > > >>> --
> > > > > > > > > >>> Ian Latter
> > > > > > > > > >>> Late night coder ..
> > > > > > > > > >>> http://midnightcode.org/
> > > > > > > > > >>>
> > > > > > > > > >>>
> > _______________________________________________
> > > > > > > > > >>> Gluster-devel mailing list
> > > > > > > > > >>> Gluster-devel@xxxxxxxxxx
> > > > > > > > > >>>
> > > > > >
https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > > > > > > >> hi Ian,
> > > > > > > > > >>        inline replies with<pranithk>.
> > > > > > > > > >>
> > > > > > > > > >> Pranith.
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Ian Latter
> > > > > > > > > > Late night coder ..
> > > > > > > > > > http://midnightcode.org/
> > > > > > > > > hi Ian,
> > > > > > > > >       Maintaining a queue of files that
need to be
> > > > > > > > self-healed does not
> > > > > > > > > scale in practice, in cases where there are
> > > > millions of
> > > > > > > > files that need
> > > > > > > > > self-heal. So such a thing is not
implemented. The
> > > > > idea is
> > > > > > > > to make
> > > > > > > > > self-heal foreground after a certain-limit
> > > > > > > > (background-self-heal-count)
> > > > > > > > > so there is no necessity for such a queue.
> > > > > > > > >
> > > > > > > > > Pranith.
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Ian Latter
> > > > > > > > Late night coder ..
> > > > > > > > http://midnightcode.org/
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Gluster-devel mailing list
> > > > > > > > Gluster-devel@xxxxxxxxxx
> > > > > > > >
> > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Ian Latter
> > > > > > > Late night coder ..
> > > > > > > http://midnightcode.org/
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Gluster-devel mailing list
> > > > > > > Gluster-devel@xxxxxxxxxx
> > > > > > >
> > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Ian Latter
> > > > > > Late night coder ..
> > > > > > http://midnightcode.org/
> > > > > >
> > > > > > _______________________________________________
> > > > > > Gluster-devel mailing list
> > > > > > Gluster-devel@xxxxxxxxxx
> > > > > >
https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Ian Latter
> > > > > Late night coder ..
> > > > > http://midnightcode.org/
> > > > > _______________________________________________
> > > > > Gluster-devel mailing list
> > > > > Gluster-devel@xxxxxxxxxx
> > > > >
https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > >
> > > >
> > > >
> > > > --
> > > > Ian Latter
> > > > Late night coder ..
> > > > http://midnightcode.org/
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxx
> > > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > >
> > >
> > >
> > > --
> > > Ian Latter
> > > Late night coder ..
> > > http://midnightcode.org/
> > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel@xxxxxxxxxx
> > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > >
> >
> >
> > --
> > Ian Latter
> > Late night coder ..
> > http://midnightcode.org/
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxx
> > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
> 


--
Ian Latter
Late night coder ..
http://midnightcode.org/