At 11:47 PM 10/22/2008, martin wrote: >Thanks for the prompt reply, folks. > > > AFR uses a database like journal to keep track of changes. When a node > > > comes back up, there will be a record on other nodes to say *some* >changes > are pending on the node that was down. AFR then simply copies >the entire > file and/or creates files on the node that came back up. > >So is that the point of the lazy healing? When a node is defined as >'dirty' any file access is verified with the 'other' node? What then >determines when the pair of nodes are 'clean'? Is that the >responsibility of the surviving node? > >My case is this - I have 2 nodes with 10,000,000 + files AFR'd for >disaster tolerance. Im developing an SOP for restoration after an event >but am working through consequences of aftershock events, and I am not >clear at present here is how I think things would have to work for your case. basically you'd have to use one of the "find" commands to force auto-healing on the DR volume after a service interruption. the number of files wouldn't be the bottleneck in this case but rather the number of directories. (they'll correct me if I'm wrong). recently, I went through this with a server.. it was offline for about 24 hours, and I ran the find command from the wiki and it took about 20 minutes to run through a 40 GB volume (there weren't many updates.. if you have a lot of updates itll take longer obviously. it seemed that large directories are a performance issue run time, but seem to make it faster when you're recovering. Keith