Re: Rebalance data migration and corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 02/08/2016 12:18 AM, Raghavendra Gowdappa wrote:

----- Original Message -----
From: "Joe Julian" <joe@xxxxxxxxxxxxxxxx>
To: gluster-devel@xxxxxxxxxxx
Sent: Monday, February 8, 2016 12:20:27 PM
Subject: Re:  Rebalance data migration and corruption

Is this in current release versions?
Yes. This bug is present in currently released versions. However, it can happen only if writes from application are happening to a file when it is being migrated. So, vaguely one can say probability is less.

Probability is quite high when the volume is used for VM images, which many are.


On 02/07/2016 07:43 PM, Shyam wrote:
On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote:

----- Original Message -----
From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
To: "Sakshi Bansal" <sabansal@xxxxxxxxxx>, "Susant Palai"
<spalai@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Nithya
Balachandran" <nbalacha@xxxxxxxxxx>, "Shyamsundar
Ranganathan" <srangana@xxxxxxxxxx>
Sent: Friday, February 5, 2016 4:32:40 PM
Subject: Re: Rebalance data migration and corruption

+gluster-devel

Hi Sakshi/Susant,

- There is a data corruption issue in migration code. Rebalance
process,
    1. Reads data from src
    2. Writes (say w1) it to dst

    However, 1 and 2 are not atomic, so another write (say w2) to
same region
    can happen between 1. But these two writes can reach dst in the
order
    (w2,
    w1) resulting in a subtle corruption. This issue is not fixed
yet and can
    cause subtle data corruptions. The fix is simple and involves
rebalance
    process acquiring a mandatory lock to make 1 and 2 atomic.
We can make use of compound fop framework to make sure we don't
suffer a
significant performance hit. Following will be the sequence of
operations
done by rebalance process:

1. issues a compound (mandatory lock, read) operation on src.
2. writes this data to dst.
3. issues unlock of lock acquired in 1.

Please co-ordinate with Anuradha for implementation of this compound
fop.

Following are the issues I see with this approach:
1. features/locks provides mandatory lock functionality only for
posix-locks
(flock and fcntl based locks). So, mandatory locks will be
posix-locks which
will conflict with locks held by application. So, if an application
has held
an fcntl/flock, migration cannot proceed.
We can implement a "special" domain for mandatory internal locks.
These locks will behave similar to posix mandatory locks in that
conflicting fops (like write, read) are blocked/failed if they are
done while a lock is held.

2. data migration will be less efficient because of an extra unlock
(with
compound lock + read) or extra lock and unlock (for non-compound fop
based
implementation) for every read it does from src.
Can we use delegations here? Rebalance process can acquire a
mandatory-write-delegation (an exclusive lock with a functionality
that delegation is recalled when a write operation happens). In that
case rebalance process, can do something like:

1. Acquire a read delegation for entire file.
2. Migrate the entire file.
3. Remove/unlock/give-back the delegation it has acquired.

If a recall is issued from brick (when a write happens from mount),
it completes the current write to dst (or throws away the read from
src) to maintain atomicity. Before doing next set of (read, src) and
(write, dst) tries to reacquire lock.
With delegations this simplifies the normal path, when a file is
exclusively handled by rebalance. It also improves the case where a
client and rebalance are conflicting on a file, to degrade to
mandatory locks by either parties.

I would prefer we take the delegation route for such needs in the future.

@Soumyak, can something like this be done with delegations?

@Pranith,
Afr does transactions for writing to its subvols. Can you suggest any
optimizations here so that rebalance process can have a transaction
for (read, src) and (write, dst) with minimal performance overhead?

regards,
Raghavendra.

Comments?

regards,
Raghavendra.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux