Re: User-serviceable snapshots design

Anand Subramanian <ansubram@xxxxxxxxxx> · Thu, 08 May 2014 15:32:46 +0530

Inline.

On 05/07/2014 10:59 PM, Ira Cooper wrote:
Anand, I also have a concern regarding the user-serviceable snapshot feature.

You rightfully call out the lack of scaling caused by maintaining the gfid -> gfid mapping tables, and correctly point out that this will limit the use cases this feature will be applicable to, on the client side.

If in fact gluster generates its gfids randomly, and has always done so, I propose that we can change the algorithm used to determine the mapping, to eliminate the lack of scaling of our solution.

We can create a fixed constant per-snapshot.  (Can be in just the client's memory, or stored on disk, that is an implementation detail here.)  We will call this constant "n".

I propose we just add the constant to the gfid determine the new gfid.  It turns out that this new gfid has the same chance of collision as any random gfid.  (It will take a moment for you to convince yourself of this, but the argument is fairly intuitive.)  If we do this, I'd suggest we do it on the first 32 bits of the gfid, because we can use simple unsigned math, and let it just overflow.  (If we get up to 2^32 snapshots, we can revisit this aspect of the design, but we'll have other issues at that number.)

By using addition this way, we also allow for subtraction to be used for a later purpose.

Note: This design relies on our random gfid generator not turning out a linear range of numbers.  If it has in the past, or will in the future, clearly this design has flaws.  But, I know of no such plans.  As long as the randomness is sufficient, there should be no issue.  (IE: It doesn't turn out linear results.)

I don't claim to understand your question completely but have a feeling 
you are going off the track here. So bear with me, as my explanation 
could be off the mark as well.

The scalability factor I mentioned simply had to do with the core 
infrastructure (depending on very basic mechanisms like the epoll wait 
thread, the entire end-to-end flow of a single fop like say, a lookup() 
here). Even though this was contained to an extent by the introduction 
of the io-threads xlator in snapd, it is still a complex path that is 
not exactly about high performance design. That wasn't the goal to begin 
with.

I am not sure what the linear range versus a non-linear one has to do 
with the design? Maybe you are seeing something that I miss. A random 
gfid is generated in the snapview-server xlator on lookups. The 
snapview-client is a kind of a basic redirector that detects when a 
reference is made to a "virtual" inode (based on stored context) and 
simply redirects to the snapd daemon. It stores the info returned from 
snapview-server, capturing the essential inode info in the inode context 
(note this is the client side inode we are talking abt).

In the daemon there is another level of translation which needs to 
associate this gfid with an inode in the context of the protocol-server 
xlator. The next step of the translation is that this inode needs to be 
translated to the actual gfid on disk - that is the only on-disk gfid 
which exists in one of the snapshotted gluster volumes. To that extent 
the snapview-s xlator needs to know which is the glfs_t structure to 
access so it can get to the right gfapi graph. Once it knows that, it 
can access any object in that gfapi graph using the glfs_object (which 
has the real inode info from the gfapi world and the actual on-disk gfid).

Anand

Thanks,

-Ira / ira@(redhat.com|samba.org)

PS: +1 to Jeff here.  He's spotting major issues, that should be looked at, above the issue above.

----- Original Message -----
Attached is a basic write-up of the user-serviceable snapshot feature
design (Avati's). Please take a look and let us know if you have
questions of any sort...
A few.

The design creates a new type of daemon: snapview-server.

* Where is it started?  One server (selected how) or all?

* How do clients find it?  Are we dynamically changing the client
   side graph to add new protocol/client instances pointing to new
   snapview-servers, or is snapview-client using RPC directly?  Are
   the snapview-server ports managed through the glusterd portmapper
   interface, or patched in some other way?

* Since a snap volume will refer to multiple bricks, we'll need
   more brick daemons as well.  How are *those* managed?

* How does snapview-server manage user credentials for connecting
   to snap bricks?  What if multiple users try to use the same
   snapshot at the same time?  How does any of this interact with
   on-wire or on-disk encryption?

I'm sure I'll come up with more later.  Also, next time it might
be nice to use the upstream feature proposal template *as it was
designed* to make sure that questions like these get addressed
where the whole community can participate in a timely fashion.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users