Re: Rebalance data migration and corruption

Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> · Mon, 8 Feb 2016 23:18:55 -0500 (EST)

----- Original Message -----
> From: "Joe Julian" <joe@xxxxxxxxxxxxxxxx>
> To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> Cc: gluster-devel@xxxxxxxxxxx
> Sent: Monday, February 8, 2016 9:08:45 PM
> Subject: Re:  Rebalance data migration and corruption
> 
> 
> 
> On 02/08/2016 12:18 AM, Raghavendra Gowdappa wrote:
> >
> > ----- Original Message -----
> >> From: "Joe Julian" <joe@xxxxxxxxxxxxxxxx>
> >> To: gluster-devel@xxxxxxxxxxx
> >> Sent: Monday, February 8, 2016 12:20:27 PM
> >> Subject: Re:  Rebalance data migration and corruption
> >>
> >> Is this in current release versions?
> > Yes. This bug is present in currently released versions. However, it can
> > happen only if writes from application are happening to a file when it is
> > being migrated. So, vaguely one can say probability is less.
> 
> Probability is quite high when the volume is used for VM images, which
> many are.

The primary requirement for this corruption is that file should be under migration. Given that rebalance is done only during add/remove brick scenarios (or may be as a routine housekeeping to make lookups faster), I added that probability is lower. However, this will not be the case with tier where files can be under constant promotion/demotion because of access patterns. If there is a constant migration, dht too is susceptible to this bug with similar probability.

> 
> >
> >> On 02/07/2016 07:43 PM, Shyam wrote:
> >>> On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote:
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> >>>>> To: "Sakshi Bansal" <sabansal@xxxxxxxxxx>, "Susant Palai"
> >>>>> <spalai@xxxxxxxxxx>
> >>>>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Nithya
> >>>>> Balachandran" <nbalacha@xxxxxxxxxx>, "Shyamsundar
> >>>>> Ranganathan" <srangana@xxxxxxxxxx>
> >>>>> Sent: Friday, February 5, 2016 4:32:40 PM
> >>>>> Subject: Re: Rebalance data migration and corruption
> >>>>>
> >>>>> +gluster-devel
> >>>>>
> >>>>>> Hi Sakshi/Susant,
> >>>>>>
> >>>>>> - There is a data corruption issue in migration code. Rebalance
> >>>>>> process,
> >>>>>>     1. Reads data from src
> >>>>>>     2. Writes (say w1) it to dst
> >>>>>>
> >>>>>>     However, 1 and 2 are not atomic, so another write (say w2) to
> >>>>>> same region
> >>>>>>     can happen between 1. But these two writes can reach dst in the
> >>>>>> order
> >>>>>>     (w2,
> >>>>>>     w1) resulting in a subtle corruption. This issue is not fixed
> >>>>>> yet and can
> >>>>>>     cause subtle data corruptions. The fix is simple and involves
> >>>>>> rebalance
> >>>>>>     process acquiring a mandatory lock to make 1 and 2 atomic.
> >>>>> We can make use of compound fop framework to make sure we don't
> >>>>> suffer a
> >>>>> significant performance hit. Following will be the sequence of
> >>>>> operations
> >>>>> done by rebalance process:
> >>>>>
> >>>>> 1. issues a compound (mandatory lock, read) operation on src.
> >>>>> 2. writes this data to dst.
> >>>>> 3. issues unlock of lock acquired in 1.
> >>>>>
> >>>>> Please co-ordinate with Anuradha for implementation of this compound
> >>>>> fop.
> >>>>>
> >>>>> Following are the issues I see with this approach:
> >>>>> 1. features/locks provides mandatory lock functionality only for
> >>>>> posix-locks
> >>>>> (flock and fcntl based locks). So, mandatory locks will be
> >>>>> posix-locks which
> >>>>> will conflict with locks held by application. So, if an application
> >>>>> has held
> >>>>> an fcntl/flock, migration cannot proceed.
> >>>> We can implement a "special" domain for mandatory internal locks.
> >>>> These locks will behave similar to posix mandatory locks in that
> >>>> conflicting fops (like write, read) are blocked/failed if they are
> >>>> done while a lock is held.
> >>>>
> >>>>> 2. data migration will be less efficient because of an extra unlock
> >>>>> (with
> >>>>> compound lock + read) or extra lock and unlock (for non-compound fop
> >>>>> based
> >>>>> implementation) for every read it does from src.
> >>>> Can we use delegations here? Rebalance process can acquire a
> >>>> mandatory-write-delegation (an exclusive lock with a functionality
> >>>> that delegation is recalled when a write operation happens). In that
> >>>> case rebalance process, can do something like:
> >>>>
> >>>> 1. Acquire a read delegation for entire file.
> >>>> 2. Migrate the entire file.
> >>>> 3. Remove/unlock/give-back the delegation it has acquired.
> >>>>
> >>>> If a recall is issued from brick (when a write happens from mount),
> >>>> it completes the current write to dst (or throws away the read from
> >>>> src) to maintain atomicity. Before doing next set of (read, src) and
> >>>> (write, dst) tries to reacquire lock.
> >>> With delegations this simplifies the normal path, when a file is
> >>> exclusively handled by rebalance. It also improves the case where a
> >>> client and rebalance are conflicting on a file, to degrade to
> >>> mandatory locks by either parties.
> >>>
> >>> I would prefer we take the delegation route for such needs in the future.
> >>>
> >>>> @Soumyak, can something like this be done with delegations?
> >>>>
> >>>> @Pranith,
> >>>> Afr does transactions for writing to its subvols. Can you suggest any
> >>>> optimizations here so that rebalance process can have a transaction
> >>>> for (read, src) and (write, dst) with minimal performance overhead?
> >>>>
> >>>> regards,
> >>>> Raghavendra.
> >>>>
> >>>>> Comments?
> >>>>>
> >>>>>> regards,
> >>>>>> Raghavendra.
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel@xxxxxxxxxxx
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel@xxxxxxxxxxx
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>
> 
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel