On 07/25/2014 03:06 PM, Les Mikesell wrote: > On Fri, Jul 25, 2014 at 3:08 PM, Benjamin Smith > <lists@xxxxxxxxxxxxxxxxxx> wrote: >> On 07/25/2014 12:12 PM, Michael Hennebry wrote: >>> Is there soome reason that the existing files cannot >>> be accessed while they are being copied to the raid? >>> >> Sheer volume. With something in the range of 100,000,000 small files, it >> takes a good day or two to rsync. This means that getting a consistent >> image without significant downtime is impossible. I can handle a few >> minutes, maybe an hour. Much more than that and I have to explore other >> options. (In this case, it looks like we'll be biting the bullet and >> switching to ZFS) > Rsync is really pretty good at that, especially the 3.x versions. If > you've just done a live rsync (or a few so there won't be much time > for changes during the last live run), the final one with the system > idle shouldn't take much more time than a 'find' traversing the same > tree. If you have space and time to test, I'd time the third pass or > so before deciding it won't work (unless even find would take too > long). Thanks for your feedback - it's advice I would have given myself just a few years ago. We have *literally* in the range of one hundred million small PDF documents. The simple command find /path/to/data > /dev/null takes between 1 and 2 days, system load depending. We had to give up on rsync for backups in this context a while ago - we just couldn't get a "daily" backup more often then about 2x per week. Now we're using ZFS + send/receive to get daily backup times down into the "sub 60 minutes" range, and I'm just going to bite the bullet and synchronize everything at the application level over the next week. Was just looking for a shortcut... _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos