Re: compound fop design first cut

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Wed, 06 Jan 2016 22:16:51 +0530

On 01/06/2016 07:50 PM, Jeff Darcy wrote:
1) fops will be compounded per inode, meaning 2 fops on different
inodes can't be compounded (Not because of the design, Just reducing
scope of the problem).

2) Each xlator that wants a compound fop packs the arguments by
itself.
Packed how?  Are we talking about XDR here, or something else?  How is
dict_t handled?  Will there be generic packing/unpacking code somewhere,
or is each translator expected to do this manually?

Packed as mentioned in step-4 below. There will be common functions 
provided which will fill an array cell with the given information to the 
function for that fop. In conjunction to that there will be filling 
functions for each of the compound fops listed at: 
https://public.pad.fsfe.org/p/glusterfs-compound-fops. XDR should be 
similar to what Soumya suggested in earlier mails just like in NFS.

3) On the server side a de-compounder placed below server xlator
unpacks the arguments and does the necessary operations.

4) Arguments for compound fops will be passed as array of union of
structures where each structure is associated with a fop.

5) Each xlator will have <xlator>_compound_fop () which receives the
fop and does additional processing that is required for itself.
What happens when (not if) some translator fails to provide this?  Is
there a default function?  Is there something at the end of the chain
that will log an error if the fop gets that far without being handled
(as with GF_FOP_IPC)?

Yes there will be default_fop provided just like other fops which is 
just a pass through. Posix will log unwind with -1, ENOTSUPP.

6) Response will also be an array of union of response structures
where each structure is associated with a fop's response.
What are the error semantics?  Does processing of a series always stop
at the first error, or are there some errors that allow retry/continue?
If/when processing stops, who's responsible for cleaning up state
changed by those parts that succeeded?  What happens if the connection
dies in the middle?

Yes, at the moment we are implementing stop at first error semantics as 
it seems to satisfy all the compound fops we listed @ 
https://public.pad.fsfe.org/p/glusterfs-compound-fops. Each translator 
which looks to handle the compound fop should handle failures just like 
they do for normal fop at the moment.

How are values returned from one operation in a series propagated as
arguments for the next?

They are not. In the first cut the only dependency between two fops now 
is whether the previous one succeeded or not. Just this much seems to 
work fine for the fops we are targeting for now: 
https://public.pad.fsfe.org/p/glusterfs-compound-fops, We may have to 
enhance it in future based on what will come up in the future.

What are the implications for buffer and message sizes?  What are the
limits on how large these can get, and/or how many operations can be
compounded?

It depends on the limits imposed by rpc layer. If it can't send the 
request, the fop will fail. If it can send the request but the response 
is too big to send back, I think the fop will lead to error by frame 
timeout for the response. Either way it will be a failure. At the moment 
for the fops listed at: 
https://public.pad.fsfe.org/p/glusterfs-compound-fops this doesn't seem 
to be a problem.

How is synchronization handled?  Is the inode locked for the duration of
the compound operation, to prevent other operations from changing the
context in which later parts of the compound operation execute?  Are
there possibilities for deadlock here?  Alternatively, if no locking is
done, are we going to document the fact that compound operations are not
atomic/linearizable?

Since we are limiting the scope to single inode fops, locking should 
suffice. EC doesn't have any problem as it just has one lock for both 
data/entry, metadata locks. In afr we need to come up with locking order 
for metadata, data domains. Something similar to what we do in rename 
where we need to take multiple locks.

Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel