AFR Caching

freedman at FreeFormIT.com (Keith Freedman) · Thu, 23 Oct 2008 00:12:55 -0700

At 11:47 PM 10/22/2008, martin wrote:
>Thanks for the prompt reply, folks.
>
> > AFR uses a database like journal to keep track of changes. When a node
>
> > comes back up, there will be a record on other nodes to say *some*
>changes > are pending on the node that was down. AFR then simply copies
>the entire  > file and/or creates files on the node that came back up.
>
>So is that the point of the lazy healing? When a node is defined as
>'dirty' any file access is verified with the 'other' node?  What then
>determines when the pair of nodes are 'clean'? Is that the
>responsibility of the surviving node?
>
>My case is this - I have 2 nodes with 10,000,000 + files AFR'd for
>disaster tolerance. Im developing an SOP for restoration after an event
>but am working through consequences of aftershock events, and I am not
>clear at present

here is how I think things would have to work for your case.
basically you'd have to use one of the "find" commands to force 
auto-healing on the DR volume after a service interruption.

the number of files wouldn't be the bottleneck in this case but 
rather the number of directories. (they'll correct me if I'm wrong).

recently, I went through this with a server.. it was offline for 
about 24 hours, and I ran the find command from the wiki and it took 
about 20 minutes to run through a 40 GB volume (there weren't many 
updates.. if you have a lot of updates itll take longer obviously.

it seemed that large directories are a performance issue run time, 
but seem to make it faster when you're recovering.

Keith