On 02/20/2013 11:53 AM, Anand Avati
wrote:
Please check http://review.gluster.org/4551.
This should fix all the known write-behind/eager-lock interaction
gaps. On top of this patch, you can now set a bit in the 'flags'
of writev fop coming out of write-behind, and look for it in AFR
to be sure that you have the 'protection layer' of write-behind
offering coverage against concurrent writes. With this you can
actually eliminate all the glusterd/volgen crud of implementing
dependencies between the two options.
Avati
Flags parameter in writev is coming from fuse/nfs xlators. Is it ok
if we use xdata instead of flags to convey that write-behind took
care of overlaps?
Pranith
On Tue, Feb 19, 2013 at 7:20 PM, Anand
Avati <anand.avati@xxxxxxxxx>
wrote:
On Tue, Feb 19, 2013 at 6:11 PM, Pranith
Kumar K <pkarampu@xxxxxxxxxx>
wrote:
On 02/20/2013 07:03 AM, Anand Avati
wrote:
On Tue, Feb 19,
2013 at 5:12 PM, Anand Avati <anand.avati@xxxxxxxxx>
wrote:
On Tue, Feb 19, 2013 at 3:59
AM, Pranith Kumar K <pkarampu@xxxxxxxxxx>
wrote:
On 02/19/2013 11:26
AM, Anand Avati wrote:
Thinking over this,
looks like there is a
problem!
Write-behind
guarantees: That a
second write request
arriving after the
acknowledgement of a
first overlapping
request (whether
written-behind or
otherwise) will be
guaranteed to be
fulfilled in the
backend in the same
order (i.e, the second
overlapping request
will be "serialized"
behind the first one
in the fulfillment
process)
Eager-lock
requirement: That
write-behind will send
no two write requests
on an overlapping
region at the same
time.
The requirement-set
and guarantee-set have
a big overlap, but the
requirement-set is not
a subset.
This is because of
O_SYNC writes.
write-behind performs
write-serialization at
fulfillment only for
written behind
requests (which get
covered under the
conflict detection
code during liability
fulfillment). However,
if two threads (or
apps) issue
overlapping O_SYNC
writes to the same
region at approx same
time, then
write-behind will let
both of them go by
without any kind of
serialization, into
eager lock, violating
the assumptions!
I'm wondering if it
is a safer idea to
implement overlap
checks within
eager-lock code itself
rather than depend on
write-behind :|
Avati
On
Mon, Feb 11, 2013 at
10:07 PM, Anand Avati
<anand.avati@xxxxxxxxx>
wrote:
On Mon, Feb
11, 2013 at 9:32
PM, Pranith
Kumar K <pkarampu@xxxxxxxxxx>
wrote:
hi,
Please note
that this is a
case in theory
and I did not
run into such
situation, but
I feel it is
important to
address this.
Configuration
with
'Eager-lock
on" and
"write-behind
off" should
not be allowed
as it leads to
lock
synchronization
problems which
lead to data
in-consistency
among replicas
in nfs.
lets say
bricks b1, b2
are in
replication.
Gluster Nfs
server uses 1
anonymous fd
to perform all
write-fops. If
eager-lock is
enabled in
afr, the
lock-owner is
used as fd's
address which
will be same
for all
write-fops, so
there will
never be any
inodelk
contention. If
write-behind
is disabled,
there can be
writes that
overlap. (Does
nfs makes sure
that the
ranges don't
overlap?)
Now imagine
the following
scenario:
lets say w1,
w2 are 2 write
fops on same
offset and
length. w1
with all '0's
and w2 with
all '1's. If
these 2 write
fops are
executed in 2
different
threads, the
order of
arrival of
write fops on
b1 can be w1,
w2 where as on
b2 it is w2,
w1 leading to
data
inconsistency
between the
two replicas.
The lock
contention
will not
happen as both
lk-owner,
transport are
same for these
2 fops.
Write-behind
has to functions
- a) performing
operations in
the background
and b)
serializing
overlapping
operations.
While the
problem does
exist, the
specifics are
different from
what you
describe. since
all writes
coming in from
NFS will always
use the same
anonymous FD,
two
near-in-time/overlapping
writes will
never contend
with inodelk()
but instead the
second write
will inherit the
lock and
changelog from
the first. In
either case, it
is a problem.
We can add a
check in
glusterd for
volume set to
disallow such
configuration,
BUT by default
write-behind
is off in nfs
graph and by
default
eager-lock is
on. So we
should either
turn on
write-behind
for nfs or
turn off
eager-lock by
default.
Could you
please suggest
how to proceed
with this if
you agree that
I did not miss
any important
detail that
makes this
theory
invalid.
It seems
loading
write-behind
xlator in NFS
graph looks
like a simpler
solution.
eager-locking is
crucial for
replicated NFS
write
performance.
Avati
Shall we disable eager-lock
for files opened with O_SYNC,
for now?
Bad news: the problem is slightly
worse than just this. Even with
non-O_SYNC writes, there is a
possibility in write-behind where,
if a second overlapping write
request comes so close to the first
request that, if wb_enqueue() of the
second one happens after
wb_enqueue() of the first write, but
before any unwind() after the first
wb_enqueue() (i.e wb_inode->gen
is not bumped), then the two write
requests can be wound down together
to eager lock.
But this has a simple fix - http://review.gluster.org/4550.
Disabling eager-locking for O_SYNC files
is a bad idea. We absolutely want
eager-locking for O_SYNC files. Thinking
more..
Avati
Why is disabling eager-lock for O_SYNC files a bad
idea? It is acceptable to sacrifice a bit of
performance for O_SYNC isn't it?
s/bit/quite a bit/. For O_SYNC writes, eager locking
is the only saving grace in performance as write-behind
stays out of the way completely. We would need overlap
checks either in AFR or write-behind for O_SYNC writes.
Avati
|