Re: [announce] Accord, a coordination service for write-intensive workload

OZAWA Tsuyoshi <ozawa.tsuyoshi@xxxxxxxxxxxxx> · Thu, 06 Oct 2011 13:39:28 +0900

> Ahm, very interesting. I have one upcoming project where I definitely
> need something like accord (if I understand its features correctly).

Thank you for your agreement. How would you like to use accrd?

> Would you please answer some questions to make things clearer in my mind?
>
> * Do you have any estimations, how many key-value pairs may it keep (in
> memory)? What data structures do you use for that mode (lists, trees)?
> How fast may it locate specific key-value pair (O(N) or faster?)?

The data structure completely depends on the back end key-value store.
Currently, Accord supports only BDB b-tree Mode.  Therefore, the answer 
is O(log n).

> * Is it suitable to store large amounts of data (billions of key-value
> pairs, terabytes of data)? I mean in-memory mode.

No.
If your machine has enough memory to memorize all key-value pair, it's 
suitable. Otherwise, the performance is going to be not good because of 
the swap out.
If you'd like to store terabytes of data, you should use Accord as the 
replicated storage for metadata and transaction log. It builds up 
availability.
Bulk data should be written in external storage(ex. Memcached), 
additionally. You can implement the strong-consistency storage with the 
total-ordered transactional logs and notification feature.

> * What does "distributed" exactly mean? May it spread data over cluster
> nodes in a controlled fashion? I mean something like "sharding".
> * What does "fully-replicated" exactly mean? Does it store replicas of
> data? May one control how exactly replicas are placed over the cluster?

Accord doesn't support sharding. All Accord servers have the same 
replicated data sets. The data synchronization is done by corosync 
cluster engine.

> * Do you plan to implement (or already did it) data checkpoint for
> in-memory mode? I mean, is it possible to flush in-memory data to disk
> at some points? And, of course, load that data at cluster startup.

Yes, I'll support it. I'm planning that the period of checkpoint will be 
option.

> * What do you consider as a possible alternative to BDB for disk mode?

Not yet, but I'm going to support the pluggable storage engine.
Do you have any concrete product name you would like to use?

(2011/10/05 16:15), Vladislav Bogdanov wrote:
Hi Ozawa-san,

04.10.2011 06:52, OZAWA Tsuyoshi wrote:
[snip]
As mentioned above, Accord is specific to write-intensive workload. It
extends the application scope of Coordination service.
Assumed applications are as follows, for example :
- Distributed Lock Manager whose lock operations occur at a high
frequency from thousands of clients.
- Metadata management service for large-scale distributed storage,
including Sheepdog, HDFS, etc.
- Replicated Message Queue or logger (For instance, replicated RabbitMQ).
and so on.

Ahm, very interesting. I have one upcoming project where I definitely
need something like accord (if I understand its features correctly).

Would you please answer some questions to make things clearer in my mind?

* Do you have any estimations, how many key-value pairs may it keep (in
memory)? What data structures do you use for that mode (lists, trees)?
How fast may it locate specific key-value pair (O(N) or faster?)?
* Is it suitable to store large amounts of data (billions of key-value
pairs, terabytes of data)? I mean in-memory mode.
* What does "distributed" exactly mean? May it spread data over cluster
nodes in a controlled fashion? I mean something like "sharding".
* What does "fully-replicated" exactly mean? Does it store replicas of
data? May one control how exactly replicas are placed over the cluster?
* Do you plan to implement (or already did it) data checkpointing for
in-memory mode? I mean, is it possible to flush in-memory data to disk
at some points? And, of course, load that data at cluster startup.
* What do you consider as a possible alternative to BDB for disk mode?

Thank you and best regards,
Vladislav
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

--
小沢 健史
NTT サイバースペース研究所
OSS コンピューティングプロジェクト
分散仮想コンピューティング技術グループ
TEL 046-859-2351
FAX 046-855-1152
Email ozawa.tsuyoshi@xxxxxxxxxxxxx
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss