On Fri, Oct 23, 2009 at 12:30 AM, Avi Kivity <avi@xxxxxxxxxx> wrote: > On 10/21/2009 07:13 AM, MORITA Kazutaka wrote: >> >> Hi everyone, >> >> Sheepdog is a distributed storage system for KVM/QEMU. It provides >> highly available block level storage volumes to VMs like Amazon EBS. >> Sheepdog supports advanced volume management features such as snapshot, >> cloning, and thin provisioning. Sheepdog runs on several tens or hundreds >> of nodes, and the architecture is fully symmetric; there is no central >> node such as a meta-data server. > > Very interesting! From a very brief look at the code, it looks like the > sheepdog block format driver is a network client that is able to access > highly available images, yes? Yes. Sheepdog is a simple key-value storage system that consists of multiple nodes (a bit similar to Amazon Dynamo, I guess). The qemu Sheepdog driver (client) divides a VM image into fixed-size objects and store them on the key-value storage system. > If so, is it reasonable to compare this to a cluster file system setup (like > GFS) with images as files on this filesystem? The difference would be that > clustering is implemented in userspace in sheepdog, but in the kernel for a > clustering filesystem. I think that the major difference between sheepdog and cluster file systems such as Google File system, pNFS, etc is the interface between clients and a storage system. > How is load balancing implemented? Can you move an image transparently > while a guest is running? Will an image be moved closer to its guest? Sheepdog uses consistent hashing to decide where objects store; I/O load is balanced across the nodes. When a new node is added or the existing node is removed, the hash table changes and the data automatically and transparently are moved over nodes. We plan to implement a mechanism to distribute the data not randomly but intelligently; we could use machine load, the locations of VMs, etc. > Can you stripe an image across nodes? Yes, a VM images is divided into multiple objects, and they are stored over nodes. > Do you support multiple guests accessing the same image? A VM image can be attached to any VMs but one VM at a time; multiple running VMs cannot access to the same VM image. > What about fault tolerance - storing an image redundantly on multiple nodes? Yes, all objects are replicated to multiple nodes. -- MORITA, Kazutaka NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: morita.kazutaka@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html