basic questions about pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Pragya

Let me try to answer these.

1#  The decisions is based on your use case ( performance , reliability ) .If you need high performance out of your cluster , the deployer will create a pool on SSD and assign this pool to applications which require higher I/O. For Ex : if you integrate openstack with Ceph , you can instruct openstack configuration files to write data to a specific ceph pool.  (http://ceph.com/docs/master/rbd/rbd-openstack/#configuring-glance) , similarly you can instruct CephFS and RadosGW with pool to use for data storage.

2#  Usually the end user (client to ceph cluster) does not bother about where the data is getting stored , which pool its using , and what is the real physical locate of data. End user will demand for specific performance , reliability and availability. It is the job of Ceph admin to fulfil  their storage requirements, out of Ceph functionalities of SSD , Erausre codes , replication level etc.


Block Device :- End user will instruct the application ( Qemu / KVM , OpenStack etc ) , which pool it should for data storage. rbd is the default pool for block device.
CephFS :- End user will mount this pool as filesystem and can use further. Default pool are data and metadata .
 RadosGW :- End user will storage objects using S3 or Swift API. 



- Karan Singh -

On 15 Jul 2014, at 07:42, pragya jain <prag_2648 at yahoo.co.in> wrote:

> thank you very much, Craig, for your clear explanation against my questions. 
> 
> Now I am very clear about the concept of pools in ceph.
> 
> But I have two small questions:
> 1. How does the deployer decide that a particular type of information will be stored in a particular pool? Are there any settings at the time of creation of pool that a deployer should make to ensure that which type of data will be stored in which pool?
> 
> 2. How does an end-user specify that his/her data will be stored in which pool? how can an end-user come to know which pools are stored on SSDs or on HDDs, what are the properties of a particular pool? 
> 
> Thanks again, Please help to clear these confusions also. 
> 
> Regards
> Pragya Jain
> 
> 
> On Sunday, 13 July 2014 5:04 AM, Craig Lewis <clewis at centraldesktop.com> wrote:
> 
> 
> I'll answer out of order.
> 
> #2: rdb is used for RDB images.  data and metadata are used by CephFS.  RadosGW's default pools will be created the first time radosgw starts up.  If you aren't using RDB or CephFS, you can ignore those pools.
> 
> #1: RadosGW will use several pools to segregate it's data.  There are a couple pools for store user/subuser information, as well as pools for storing the actual data.  I'm using federation, and I have a total of 18 pools that RadosGW is using in some form.  Pools are a way to logically separate your data, and pools can also have different replication/storage settings.  For example, I could say that the .rgw.buckets.index pool needs 4x replication and is only stored on SSDs, while .rgw.bucket is 3x replication on HDDs.
> 
> #3: In addition to #1, you can setup different pools to actually store user data in RadosGW.  For example, an end user may have some very important data that you want replicated 4 times, and some other data that needs to be stored on SSDs for low latency.  Using CRUSH, you would create the some rados pools with those specs.  Then you'd setup some placement targets in RadosGW that use those pools.  A user that cares will specify a placement target when they create a bucket.  That way they can decide what the storage requirements are.  If they don't care, then they can just use the default.
> 
> Does that help?
> 
> 
> 
> On Thu, Jul 10, 2014 at 11:34 PM, pragya jain <prag_2648 at yahoo.co.in> wrote:
> hi all,
> 
> I have some very basic questions about pools in ceph.
> 
> According to ceph documentation, as we deploy a ceph cluster with radosgw instance over it, ceph creates pool by default to store the data or the deployer can also create pools according to the requirement.
> 
> Now, my question is:
> 1. what is the relevance of multiple pools in a cluster?
> i.e. why should a deployer create multiple pools in a cluster? what should be the benefits of creating multiple pools?
> 
> 2. according to the docs, the default pools are data, metadata, and rbd.
> what is the difference among these three types of pools?
> 
> 3. when a system deployer has deployed a ceph cluster with radosgw interface and start providing services to the end-user, such as, end-user can create their account on the ceph cluster and can store/retrieve their data to/from the cluster, then Is the end user has any concern about the pools created in the cluster?
> 
> Please somebody help me to clear these confusions.
> 
> regards
> Pragya Jain
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140715/cd407467/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux