Thank you very much! And I have some questions。 1、What's the capacity of the largest cluster online ?And how many nodes in it? And What is it used for? 2、When we excute 'ls' in a directory,it's very slow,if the cluster has too many bricks and too many nodes.Can we do it well? -----邮件原件----- 发件人: Anand Babu Periasamy [mailto:abperiasamy@xxxxxxxxx] 发送时间: 2012年5月9日 15:55 收件人: renqiang 抄送: gluster-devel@xxxxxxxxxx 主题: Re: How to repair a 1TB disk in 30 mins On Tue, May 8, 2012 at 9:46 PM, 任强 <renqiang@xxxxxxxxxx> wrote: > Dear All: > > I have a question. When I have a large cluster, maybe more than 10PB data, > if a file have 3 copies and each disk have 1TB capacity, So we need about > 30,000 disks. All disks are very cheap and are easily damaged. We must > repair a 1TB disk in 30 mins。As far as I know,in gluster architecture,all > data in the damaged disk will be repaired to the new disk which is used to > replace the damaged disk. As a result of the writing speed of disk, when we > repair 1TB disk in gluster, we need more than 5 hours. Can we do it in 30 > mins? 5 hours is based on SATA 1TB disk copying at ~50MB/s across small and large files + folders. This means, you literally attached the disk to the system and manually transferring the data. I can't think of any other faster way to transfer data on 1TB 7200RPM SATA/SAS disks without bending space-time ;). Larger disks and RAID arrays only makes this worse. This is exactly why we implemented passive self-heal in the first place. GlusterFS heals files on demand (as they are accessed), so applications have least down time or disruption. There is plenty of time to heal the cold data in background. All we should care is minimal down time. Self-heal in 3.3 has some major improvements. It got significantly faster, because healing is performed on the server side entirely (server to server). It can perform granular healing on large files (previously checksum operations used to pause or timeout the VMs). Active-healing (Replicate now remembers pending files and heals them when the failed node comes back. Previously you have to perform name-space wide recursive directory listing). Most importantly self-healing is no longer a blackbox. heal-info can show pending and currently-healing files. -- Anand Babu Periasamy Blog [http://www.unlocksmith.org] Imagination is more important than knowledge --Albert Einstein