On Mon, Jan 31, 2011 at 12:45:31PM -0200, Roberto Spadim wrote: > i think filesystem is a problem... > you can't have two writers over a filesystem that allow only one, or > you will have filesystem crash (a lot of fsck repair... local cache > and other's features), maybe a gfs ocfs or another is a better > solution... No, for _our_ use case (replicated disks for VMs running under Xen with live migration) the fileystem just _does_ _not_ _matter_ _at_ _all_. Due to the way Xen live migration works, there is only one writer at any one time: the VM "owning" the virtual disk provided by drbd. To illustrate the point, a very short summary of what happens during Xen live migration in our setup: - VM is to be migrated from host A to host B, with the virtual block device for the instance being provided by a drbd pair running on those hosts - host A/B are configured primary/secondary - we reconfigure drbd to primary/primary - start Xen live migration - Xen creates a target VM on host B, this VM is not yet running - Xen syncs live VM memory from host A to host B - when most of the memory is synced over, Xen suspends execution of the VM on host A - Xen copies the remaining dirty VM memory from host A to host B - Xen resumes VM execution on host B, destroys the source VM on host A, Xen live migration is completed - we reconfigure drbd on hosts A/B to secondary/primary There is no concurrent access to the virtual block device here anywhere. And the only reason we go primary/primary during live migration is that for Xen to attach the disks to the target VM, they have to be available and accessible on the target node - as well as on the source node where they are currently attached to the source VM. Now, if you were doing things like, say, use an primary/primary drbd setup for NFS servers serving in parallel from two hosts, then yes, you'd have to take special steps with a proper parallel filesystem to avoid corruption. But this is a completely different problem. Kidn regards, Alex. > > 2011/1/31 Alexander Schreiber <als@xxxxxxxxxxxxxxx>: > > On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote: > >> 2011/1/29 Alexander Schreiber <als@xxxxxxxxxxxxxxx>: > >> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote: > >> >> 2011/1/29 Alexander Schreiber <als@xxxxxxxxxxxxxxx> > >> >> > >> >> > > >> >> > plain disk performance for writes, while reads should be reasonably > >> >> > close to the plain disk performance - drbd optimizes reads by just reading > >> >> > from the local disk if it can. > >> >> > > >> >> > > >> >> However, I have not used it with active-active fashion. Have you? if yes, > >> >> what is your overall experience? > >> > > >> > We are using drbd to provide mirrored disks for virtual machines running > >> > under Xen. 99% of the time, the drbd devices run in primary/secondary > >> > mode (aka active/passive), but they are switched to primary/primary > >> > (aka active/active) for live migrations of domains, as that needs the > >> > disks to be available on both nodes. From our experience, if the drbd > >> > device is healthy, this is very reliable. No experience with running > >> > drbd in primary/primary config for any extended period of time, though > >> > (the live migrations are usually over after a few seconds to a minute at > >> > most, then the drbd devices go back to primary/secondary). > >> > >> What filesystem are you using to enable the primary-primary mode? Have > >> you evaluated it against any other available option? > > > > The filesystem is whatever the VM is using, usually ext3. But the > > filesystem doesn't matter in our use case at all, because: > > - the backing store for drbd are logical volumes > > - the drbd block devices are directly exported as block devices > > to the VMs > > The filesystem is only active inside the VM - and the VM is not aware of > > the drbd primary/secondary -> primary/primary -> primary/secondary dance > > that happens "outside" to enable live migration. -- "Opportunity is missed by most people because it is dressed in overalls and looks like work." -- Thomas A. Edison -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html