On Wed, Jul 04, 2007 at 07:33:14PM +0530, Anand Avati wrote: > Gerry, > your question is appropriate, but the answer to 'when to resync' is not > very simple. when a brick which was brought down is brought up later, it may > be a completely new (empty) brick. In that case starting to sync every file > would most likely be the wrong decision. (we should rather sync the file > which the user needs than some unused file). Even if we chose to sync files > without user accessing them it would be very sluggish too since it would be > intervening in other operations. Doesn't this situation compare to RAIDs when a spare disk (hot or cold) replaces a failed one? Data integrity *demands* to restore missing data as fast as possible - the next failure could kill the last valid copy. (That's why RAID-6 has become so popular: the risk to lose two disks within a short time span cannot be neglected) > The current approach is to sync files on the next open() on it. This is > usually a good balance since, during open() if we were to sync a file, even > if it was a GB it would take 10-15 secs, and for normal files (in the order > of few MBs) it is almost not noticable. But if this were to happen together > for all files whether the user accessed them or not there would be a lot of > traffic and be very sluggish. > > This approach of syncing on open() is what even other filesystems which > support redundancy do. Sounds like the ZFS approach: data is repaired when corruption is detected. This happens on access (when the metadata layer detects that the block doesn't match its checksum) *but* there's the opportunity to have a background scrubber. Probably it's worth to have a (possibly low-prioritized) background thread that compares the actual local filesystem to the namespace structure, and starts the necessary repair actions. This certainly is not client-based though. > Detecting 'idle time' and beginning sync-up and pausing the sync-up when > user begins activity is a very tricky job, but that is definitely what we > aim at finally. It is not enough if AFR detects the client is free, because > the servers may be busy serving files to another client and syncing at that > time may not be the most apprpriate time. The following versions of AFR will > have more options to tune 'when' to sync. Currently it is only at open(). We > plan to add options to make it sync on lookup() (happens on ls). Later > versions would have pro-active syncing (detecting that both server and > clients are idle etc). Sounds reasonable... Steffen