On Fri, Jan 13, 2017 at 8:03 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx> wrote:
Hi,
On 13/01/17 10:58, jayakrishnan mm wrote:
Hi Xavier,
I went through the source code. Some questions remain.
1. If two clients try to write to same file, it should succeed, even if
they overlap. (Locks should ensure it happens in sequence, in the bricks).
from the source code
lock->flock.l_type = F_WRLCK;
lock->flock.l_whence = SEEK_SET;
fop->flock.l_len += ec_adjust_offset(fop->xl->private,
&fop->flock.l_start, 1);
fop->flock.l_len = ec_adjust_size(fop->xl->private,
fop->flock.l_len, 1);
if flock.l_len is 0, the entire file is locked for writing
In my test case with 2 clients, I always get flock.l_len as 0. But
still I am able to write to the same file from both clients at the
same time.
How are you sure you are really writing at the same time ? do you get partial writes from some of the client ?
I am not sure, if they are happening simultaneously. I am using fio to do that.
If it is acquiring lock chunk by chunk, why I am getting l_len =0
always ?
EC doesn't acquire partial locks. The entire file is locked when a modification is needed. This makes possible to reuse locks for future operations (eager locking).
Why I am not getting the actual write size and offset f(for
flock.l_len & flock.l_start respectively) for each write FOP ?
(In afr , it is set to transaction.len transaction.start respectively,
which in turn is write length & offset for the normal write case)
Because an erasure code splits the data is smaller fragments for each brick, so offsets and lengths need to be adjusted.
2. As per source code ,a full file lock is taken by the shd also.
ec_heal_inodelk(heal, F_WRLCK, 1, 0, 0);
which means offset=0 & size=0 in ec_heal_lock() function in ec-heal.c
flock.l_start = offset;
flock.l_len = size;
Does it mean , in a single file write cannot happen simultaneously with
healing?
Correct. Heal procedure is like an additional client. If a client and the heal process try to write at the same time, they must be serialized, like any other regular write. However heal only takes the full lock for some critical operations. Regular self heal of file contents is done locking chunk by chunk.
Have got a question about index heal/full heal.
As per the code, index healer thread (ec_shd_index_healer)is created when there is a child_up event OR when there is a TRANSLATOR_OP/GF_SHD_OP_HEAL_INDEX. When does the second case arise ?
Full heal thread(ec_shd_full_healer) is created only when TRANSLATOR_OP/GF_SHD_OP_HEAL_FULL arise. Does this happen during replace brick condition only ?
Thanks & regards
JK
Xavi
Correct me , if I am wrong.
Best Regards
JK
On Wed, Dec 14, 2016 at 12:07 PM, jayakrishnan mm
<jayakrishnan.mm@xxxxxxxxx <mailto:jayakrishnan.mm@gmail.com >> wrote:
Thanks Xavier, for making it clear.
Regards
JK
On Dec 13, 2016 3:52 PM, "Xavier Hernandez" <xhernandez@xxxxxxxxxx<mailto:jayakrishnan.mm@gmail.<mailto:xhernandez@xxxxxxxxxx>> wrote:
Hi JK,
On 12/13/2016 08:34 AM, jayakrishnan mm wrote:
Dear Xavi,
How do I test the locks, for example locks for write fop.
I have two
clients(independent), both are trying to write to same file.
1. According to my understanding, both can successfully
write if the
offsets don't overlap . I mean, the WRITE FOP takes a chunk
lock on the
file . As
long as the clients don't try to write to the same chunk,
it should be
OK. If no locks present, it can lead to inconsistency.
With locks all writes will be fine as defined by posix (i.e. the
final result will be equivalent to the sequential execution of
both operations, though in an undefined order), even if they
overlap. Without locks, there are chances that some bricks
execute the operations in one order and the remaining bricks
execute the same operations in the reverse order, causing data
corruption.
2. Different FOPs can always run simultaneously. (Example
WRITE and
READ FOPs, or two READ FOPs).
All fops can be executed concurrently. If there's any chance
that two operations could interfere, locks are taken in the
appropriate places. For example, reads cannot be merged with
overlapping writes. Otherwise they could return inconsistent data.
3. WRITE & some metadata FOP (like setattr) together .
Cannot happen
together with locks , even though chances are very low.
As in 2, if there's any possible interference, the appropriate
locks will be taken.
You can look at the code to see which locks are taken for each
fop. See the corresponding ec_manager_<fop>() function, in the
EC_STATE_LOCK switch case. There you will see calls to
ec_lock_prepare_xxx() for each taken lock.
Xavi
Pls. clarify.
Best regards
JK
On Wed, Nov 30, 2016 at 5:49 PM, jayakrishnan mm
<jayakrishnan.mm@xxxxxxxxx
<mailto:jayakrishnan.mm@gmail.com >com
<mailto:jayakrishnan.mm@gmail.com >>> wrote:
Hi Xavier,
Thank you very much for your explanation. This helped me to
understand more about locking in EC.
Best Regards
JK
On Mon, Nov 28, 2016 at 4:17 PM, Xavier Hernandez
<xhernandez@xxxxxxxxxx <mailto:xhernandez@xxxxxxxxxx>
<mailto:xhernandez@xxxxxxxxxx<mailto:Gluster-devel@gluster.
<mailto:xhernandez@xxxxxxxxxx>>> wrote:
Hi,
On 11/28/2016 02:59 AM, jayakrishnan mm wrote:
Hi Xavier,
Notice that EC xlator uses blocking locks. Any
specific
reason for this?
In a distributed filesystem like gluster a
synchronization
mechanism is a must to avoid data corruption.
Do you think this will affect the performance ?
Of course the need for locks has a performance
impact, and we
cannot avoid them to guarantee data integrity.
However some
optimizations have been applied, specially the eager
locking
which allows a lock to be reused without
unlocking/locking again.
(In comparison AFR first tries non blocking
locks and if not
successful, tries blocking locks then)
EC also tries a non-blocking lock first.
Also, why two locks are needed per FOP ? One
for normal
I/O and
another for self healing?
The only fop that currently needs two locks is
'rename', and
only when source and destination directories are
different. All
other fops only take one lock at most.
Best regards,
Xavi
Best regards
JK
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
<mailto:Gluster-devel@gluster.org >org
<mailto:Gluster-devel@gluster.org >>
http://www.gluster.org/mailman/listinfo/gluster-devel
<http://www.gluster.org/mailman/listinfo/gluster-devel >
<http://www.gluster.org/mailman/listinfo/gluster-devel
<http://www.gluster.org/mailman/listinfo/gluster-devel >>
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel