Re: [Patch 5/7] tabled: Add replication daemon

Jeff Garzik <jeff@xxxxxxxxxx> · Sun, 15 Nov 2009 05:47:15 -0500

On 11/14/2009 11:53 PM, Pete Zaitcev wrote:
On Sat, 14 Nov 2009 18:44:26 -0500, Jeff Garzik<jeff@xxxxxxxxxx>  wrote:

1) What is the point of db->del() in rep_add_nid() ?  You are in the
middle of a transaction, and you immediately overwrite that record in
the same transaction.  That is clearly unnecessary work, when db->put()
will simply overwrite an existing record, if requested.

That didn't work last I tested, but I forgot what the issue was.
The code is copied from object.c. I'll review it.

The only code matching that pattern is object_put_end(), which also 
performs unnecessary work of needlessly calling db->del().

This should be recognized as nothing more than programmer laziness, 
enabled by the sharing of __object_del() between DEL and PUT, and the 
necessity of deleting associated records (ACLs) in conjunction with the 
main [deletion | update].

That does not apply to rep_add_nid(), which has neither object data nor 
related records to delete.

(patches accepted to remove ->del from PUT path, too, if you wish)

2) rep_scan():  I would rather not make the entire daemon non-responsive
for multiple seconds.

The database is already in multi-threaded mode, and db4 is
free-threaded, so it would seem to make a lot more sense to simply
g_thread_create() a thread to do this work.

True, but multi-threading in tabled is a much bigger undertaking than
just "simply" creating a thread. There's a lot of common state.
I think the 2s problem is less urgent than processing massive
replication more efficiently (from a Chunk down).

I only see one global variable reference in the whole nicely-designed, 
nicely self-contained replica.c file.  "a lot of common state" seems 
like quite an exaggeration.

This patch resuscitates the worst application model from the late 
1980's, cooperative multi-tasking.  I do not want to walk down that 
road:  such programs have weird pauses at weird times, and behave in a 
very non-deterministic manner.

Furthermore, I think you will find that threading gives you a lot more 
freedom to work in the background.

Also, there should be no need to scan a chunkd's keys at all, as long as
the chunkd instance is still communicating with us.

No, there's a need, although there's no code for it: the scan should
probe keys by requesting their metadata, which forturously includes
checksums. This is how the results of Chunk's self-test will be
communicated to tabled.

What is the need?  This description provides "what" and "how" but does 
not answer "why?"

	Jeff

--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html