Re: compound fop design first cut

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Fri, 11 Dec 2015 15:32:13 +0530

On 12/09/2015 11:48 PM, Pranith Kumar Karampuri wrote:

On 12/09/2015 08:11 PM, Shyam wrote:
On 12/09/2015 02:37 AM, Soumya Koduri wrote:

On 12/09/2015 11:44 AM, Pranith Kumar Karampuri wrote:

On 12/09/2015 06:37 AM, Vijay Bellur wrote:
On 12/08/2015 03:45 PM, Jeff Darcy wrote:

On December 8, 2015 at 12:53:04 PM, Ira Cooper (ira@xxxxxxxxxx) 
wrote:
Raghavendra Gowdappa writes:
I propose that we define a "compound op" that contains ops.

Within each op, there are fields that can be "inherited" from the
previous op, via use of a sentinel value.

Sentinel is -1, for all of these examples.

So:

LOOKUP (1, "foo") (Sets the gfid value to be picked up by
compounding, 1
is the root directory, as a gfid, by convention.)
OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound 
value.)
WRITE(-1, "foo", 3) (Uses the glfd compound value.)
CLOSE(-1) (Uses the glfd compound value)

So, basically, what the programming-language types would call 
futures
and promises.  It’s a good and well studied concept, which is 
necessary
to solve the second-order problem of how to specify an argument in
sub-operation N+1 that’s not known until sub-operation N completes.

To be honest, some of the highly general approaches suggested here
scare
me too.  Wrapping up the arguments for one sub-operation in xdata 
for
another would get pretty hairy if we ever try to go beyond two
sub-operations and have to nest sub-operation #3’s args within
sub-operation #2’s xdata which is itself encoded within 
sub-operation
#1’s xdata.  There’s also not much clarity about how to handle
errors in
that model.  Encoding N sub-operations’ arguments in a linear 
structure
as Shyam proposes seems a bit cleaner that way.  If I were to 
continue
down that route I’d suggest just having start_compound and 
end-compound
fops, plus an extra field (or by-convention xdata key) that 
either the
client-side or server-side translator could use to build whatever
structure it wants and schedule sub-operations however it wants.

However, I’d be even more comfortable with an even simpler approach
that
avoids the need to solve what the database folks (who have dealt 
with
complex transactions for years) would tell us is a really hard 
problem.
Instead of designing for every case we can imagine, let’s design for
the
cases that we know would be useful for improving performance. 
Open plus
read/write plus close is an obvious one.  Raghavendra mentions
create+inodelk as well.  For each of those, we can easily define a
structure that contains the necessary fields, we don’t need a
client-side translator, and the server-side translator can take 
care of
“forwarding” results from one sub-operation to the next. We could 
even
use GF_FOP_IPC to prototype this.  If we later find that the 
number of
“one-off” compound requests is growing too large, then at least 
we’ll
have some experience to guide our design of a more general 
alternative.
Right now, I think we’re trying to look further ahead than we can 
see
clearly.
Yes Agree. This makes implementation on the client side simpler as 
well.
So it is welcome.

Just updating the solution.
1) New RPCs are going to be implemented.
2) client stack will use these new fops.
3) On the server side we have server xlator implementing these new 
fops
to decode the RPC request then resolve_resume and
compound-op-receiver(Better name for this is welcome) which sends 
one op
after other and send compound fop response.

@Pranith, I assume you would expand on this at a later date 
(something along the lines of what Soumya has done below, right?

I will talk to her tomorrow to know more about this. Not saying this 
is what I will be implementing (There doesn't seem to be any consensus 
yet). But I would love to know how it is implemented.

Soumya and I had a discussion about this and it seems like the NFS way 
of stuffing the args seems to workout at a high level. Even the sentinel 
value based work may also be possible. What I will do now is to take a 
look at the structure deeply and work out how all the fops mentioned in 
this thread can be implemented. I will update you guys about my findings 
in a couple of days.

Pranith

Pranith

List of compound fops identified so far:
Swift/S3:
PUT: creat(), write()s, setxattr(), fsync(), close(), rename()

Dht:
mkdir + inodelk

Afr:
xattrop+writev, xattrop+unlock to begin with.

Could everyone who needs compound fops add to this list?

I see that Niels is back on 14th. Does anyone else know the list of
compound fops he has in mind?

 From the discussions we had with Niels regarding the kerberos support
on GlusterFS, I think below are the set of compound fops which are
required.

set_uid +
set_gid +
set_lkowner (or kerberos principal name) +
actual_fop

Also gfapi does lookup (first time/to refresh inode) before performing
actual fops most of the times. It may really help if we can club such
fops -

@Soumya +5 (just a random number :) )

This came to my mind as well, and is a good candidate for compounding.

LOOKUP + FOP (OPEN etc)

Coming to the design proposed, I agree with Shyam, Ira and Jeff's
thoughts. Defining different compound fops for each specific set of
operations and wrapping up those arguments in xdata seem rather complex
and difficult to maintain going further. Having being worked with NFS,
may I suggest why not we follow (or in similar lines)  the approach
being taken by NFS protocol to define and implement compound 
procedures.

    The basic structure of the NFS COMPOUND procedure is:

+-----+--------------+--------+-----------+-----------+-----------+--
    | tag | minorversion | numops | op + args | op + args | op + args |
+-----+--------------+--------+-----------+-----------+-----------+--

    and the reply's structure is:

       +------------+-----+--------+-----------------------+--
       |last status | tag | numres | status + op + results |
       +------------+-----+--------+-----------------------+--

Each compound procedure will contain the number of operations followed
by the list of 'op_code+arguments_for_that_fop'

So on similar lines, we just need to define new RPC structure for
COMPOUND fops (something like below) and xdr encode/decode of each of
the ops based on the op number.

struct argop {
          uint32_t    op_num;
          union argop switch (op_num) {
              case <OPCODE>: <argument>;
              ...
          }op_args;
      };

      struct COMPOUNDargs {
              uint32_t    version;
          uint32_t     numops;
              argop      argarray<>;
      };

    RESULT

      union resop switch (opnum resop){
              case <OPCODE>: <result>;
              ...
      };

      struct COMPOUND4res {
              uint32_t        status;
              resop         resarray<>;
      };

The xlator which would like to club fops can define this new COMPOUND
fop with the list of operations. For eg., AFR can construct this
compound fop as

compound_fop (struct COMPOUNDargs c_args);

c_args.version =1
c_args.numops = 2
c_args.argarray[0].op_num=fxattr_op_num;
c_args.argarray[0].op_args = fxattr_op_args;
c_args.argarray[0].op_num=writev_op_num;
c_args.argarray[0].op_args = writev_op_args;

On the server-side , the new compound xlator on receiving this compound
fop can split the fops and execute one by one as already mentioned 
by you.

Any thoughts?

Thanks,
Soumya

Pranith.

Starting with a well defined set of operations for compounding has 
its
advantages. It would be easier to understand and maintain correctness
across the stack. Some of our translators perform transactions &
create/update internal metadata for certain fops. It would be easier
for such translators if the compound operations are well defined and
does not entail deep introspection of a generic representation to
ensure that the right behavior gets reflected at the end of a 
compound
operation.

-Vijay

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel