Re: raid over ethernet

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Mon, 31 Jan 2011 15:37:32 -0200

nice, you don´t have two writers.

2011/1/31 Alexander Schreiber <als@xxxxxxxxxxxxxxx>:
> On Mon, Jan 31, 2011 at 12:45:31PM -0200, Roberto Spadim wrote:
>> i think filesystem is a problem...
>> you can't have two writers over a filesystem that allow only one, or
>> you will have filesystem crash (a lot of fsck repair... local cache
>> and other's features), maybe a gfs ocfs or another is a better
>> solution...
>
> No, for _our_ use case (replicated disks for VMs running under Xen
> with live migration) the fileystem just _does_ _not_ _matter_ _at_
> _all_. Due to the way Xen live migration works, there is only one
> writer at any one time: the VM "owning" the virtual disk provided
> by drbd.
>
> To illustrate the point, a very short summary of what happens during
> Xen live migration in our setup:
>  - VM is to be migrated from host A to host B, with the virtual block
>   device for the instance being provided by a drbd pair running on
>   those hosts
>  - host A/B are configured primary/secondary
>  - we reconfigure drbd to primary/primary
>  - start Xen live migration
>  - Xen creates a target VM on host B, this VM is not yet running
>  - Xen syncs live VM memory from host A to host B
>  - when most of the memory is synced over, Xen suspends execution of
>   the VM on host A
>  - Xen copies the remaining dirty VM memory from host A to host B
>  - Xen resumes VM execution on host B, destroys the source VM
>   on host A, Xen live migration is completed
>  - we reconfigure drbd on hosts A/B to secondary/primary
>
> There is no concurrent access to the virtual block device here anywhere.
> And the only reason we go primary/primary during live migration is that
> for Xen to attach the disks to the target VM, they have to be available
> and accessible on the target node - as well as on the source node where
> they are currently attached to the source VM.
>
> Now, if you were doing things like, say, use an primary/primary drbd
> setup for NFS servers serving in parallel from two hosts, then yes,
> you'd have to take special steps with a proper parallel filesystem
> to avoid corruption. But this is a completely different problem.
>
> Kidn regards,
>          Alex.
>>
>> 2011/1/31 Alexander Schreiber <als@xxxxxxxxxxxxxxx>:
>> > On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote:
>> >> 2011/1/29 Alexander Schreiber <als@xxxxxxxxxxxxxxx>:
>> >> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
>> >> >> 2011/1/29 Alexander Schreiber <als@xxxxxxxxxxxxxxx>
>> >> >>
>> >> >> >
>> >> >> > plain disk performance for writes, while reads should be reasonably
>> >> >> > close to the plain disk performance - drbd optimizes reads by just reading
>> >> >> > from the local disk if it can.
>> >> >> >
>> >> >> >
>> >> >>  However, I have not used it with active-active fashion. Have you? if yes,
>> >> >> what is your overall experience?
>> >> >
>> >> > We are using drbd to provide mirrored disks for virtual machines running
>> >> > under Xen. 99% of the time, the drbd devices run in primary/secondary
>> >> > mode (aka active/passive), but they are switched to primary/primary
>> >> > (aka active/active) for live migrations of domains, as that needs the
>> >> > disks to be available on both nodes. From our experience, if the drbd
>> >> > device is healthy, this is very reliable. No experience with running
>> >> > drbd in primary/primary config for any extended period of time, though
>> >> > (the live migrations are usually over after a few seconds to a minute at
>> >> > most, then the drbd devices go back to primary/secondary).
>> >>
>> >> What filesystem are you using to enable the primary-primary mode? Have
>> >> you evaluated it against any other available option?
>> >
>> > The filesystem is whatever the VM is using, usually ext3. But the
>> > filesystem doesn't matter in our use case at all, because:
>> >  - the backing store for drbd  are logical volumes
>> >  - the drbd block devices are directly exported as block devices
>> >   to the VMs
>> > The filesystem is only active inside the VM - and the VM is not aware of
>> > the drbd primary/secondary -> primary/primary -> primary/secondary dance
>> > that happens "outside" to enable live migration.
>
> --
> "Opportunity is missed by most people because it is dressed in overalls and
>  looks like work."                                      -- Thomas A. Edison
>
>

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html