----- Original Message ----- > From: "Joe Julian" <joe@xxxxxxxxxxxxxxxx> > To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > Cc: gluster-devel@xxxxxxxxxxx > Sent: Monday, February 8, 2016 9:08:45 PM > Subject: Re: Rebalance data migration and corruption > > > > On 02/08/2016 12:18 AM, Raghavendra Gowdappa wrote: > > > > ----- Original Message ----- > >> From: "Joe Julian" <joe@xxxxxxxxxxxxxxxx> > >> To: gluster-devel@xxxxxxxxxxx > >> Sent: Monday, February 8, 2016 12:20:27 PM > >> Subject: Re: Rebalance data migration and corruption > >> > >> Is this in current release versions? > > Yes. This bug is present in currently released versions. However, it can > > happen only if writes from application are happening to a file when it is > > being migrated. So, vaguely one can say probability is less. > > Probability is quite high when the volume is used for VM images, which > many are. The primary requirement for this corruption is that file should be under migration. Given that rebalance is done only during add/remove brick scenarios (or may be as a routine housekeeping to make lookups faster), I added that probability is lower. However, this will not be the case with tier where files can be under constant promotion/demotion because of access patterns. If there is a constant migration, dht too is susceptible to this bug with similar probability. > > > > >> On 02/07/2016 07:43 PM, Shyam wrote: > >>> On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: > >>>> > >>>> ----- Original Message ----- > >>>>> From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > >>>>> To: "Sakshi Bansal" <sabansal@xxxxxxxxxx>, "Susant Palai" > >>>>> <spalai@xxxxxxxxxx> > >>>>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Nithya > >>>>> Balachandran" <nbalacha@xxxxxxxxxx>, "Shyamsundar > >>>>> Ranganathan" <srangana@xxxxxxxxxx> > >>>>> Sent: Friday, February 5, 2016 4:32:40 PM > >>>>> Subject: Re: Rebalance data migration and corruption > >>>>> > >>>>> +gluster-devel > >>>>> > >>>>>> Hi Sakshi/Susant, > >>>>>> > >>>>>> - There is a data corruption issue in migration code. Rebalance > >>>>>> process, > >>>>>> 1. Reads data from src > >>>>>> 2. Writes (say w1) it to dst > >>>>>> > >>>>>> However, 1 and 2 are not atomic, so another write (say w2) to > >>>>>> same region > >>>>>> can happen between 1. But these two writes can reach dst in the > >>>>>> order > >>>>>> (w2, > >>>>>> w1) resulting in a subtle corruption. This issue is not fixed > >>>>>> yet and can > >>>>>> cause subtle data corruptions. The fix is simple and involves > >>>>>> rebalance > >>>>>> process acquiring a mandatory lock to make 1 and 2 atomic. > >>>>> We can make use of compound fop framework to make sure we don't > >>>>> suffer a > >>>>> significant performance hit. Following will be the sequence of > >>>>> operations > >>>>> done by rebalance process: > >>>>> > >>>>> 1. issues a compound (mandatory lock, read) operation on src. > >>>>> 2. writes this data to dst. > >>>>> 3. issues unlock of lock acquired in 1. > >>>>> > >>>>> Please co-ordinate with Anuradha for implementation of this compound > >>>>> fop. > >>>>> > >>>>> Following are the issues I see with this approach: > >>>>> 1. features/locks provides mandatory lock functionality only for > >>>>> posix-locks > >>>>> (flock and fcntl based locks). So, mandatory locks will be > >>>>> posix-locks which > >>>>> will conflict with locks held by application. So, if an application > >>>>> has held > >>>>> an fcntl/flock, migration cannot proceed. > >>>> We can implement a "special" domain for mandatory internal locks. > >>>> These locks will behave similar to posix mandatory locks in that > >>>> conflicting fops (like write, read) are blocked/failed if they are > >>>> done while a lock is held. > >>>> > >>>>> 2. data migration will be less efficient because of an extra unlock > >>>>> (with > >>>>> compound lock + read) or extra lock and unlock (for non-compound fop > >>>>> based > >>>>> implementation) for every read it does from src. > >>>> Can we use delegations here? Rebalance process can acquire a > >>>> mandatory-write-delegation (an exclusive lock with a functionality > >>>> that delegation is recalled when a write operation happens). In that > >>>> case rebalance process, can do something like: > >>>> > >>>> 1. Acquire a read delegation for entire file. > >>>> 2. Migrate the entire file. > >>>> 3. Remove/unlock/give-back the delegation it has acquired. > >>>> > >>>> If a recall is issued from brick (when a write happens from mount), > >>>> it completes the current write to dst (or throws away the read from > >>>> src) to maintain atomicity. Before doing next set of (read, src) and > >>>> (write, dst) tries to reacquire lock. > >>> With delegations this simplifies the normal path, when a file is > >>> exclusively handled by rebalance. It also improves the case where a > >>> client and rebalance are conflicting on a file, to degrade to > >>> mandatory locks by either parties. > >>> > >>> I would prefer we take the delegation route for such needs in the future. > >>> > >>>> @Soumyak, can something like this be done with delegations? > >>>> > >>>> @Pranith, > >>>> Afr does transactions for writing to its subvols. Can you suggest any > >>>> optimizations here so that rebalance process can have a transaction > >>>> for (read, src) and (write, dst) with minimal performance overhead? > >>>> > >>>> regards, > >>>> Raghavendra. > >>>> > >>>>> Comments? > >>>>> > >>>>>> regards, > >>>>>> Raghavendra. > >>> _______________________________________________ > >>> Gluster-devel mailing list > >>> Gluster-devel@xxxxxxxxxxx > >>> http://www.gluster.org/mailman/listinfo/gluster-devel > >> _______________________________________________ > >> Gluster-devel mailing list > >> Gluster-devel@xxxxxxxxxxx > >> http://www.gluster.org/mailman/listinfo/gluster-devel > >> > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel