Re: Using Ceph central backup storage - Best practice creating pools

Simon Leinen <simon.leinen@xxxxxxxxx> · Sat, 26 Jan 2019 08:26:47 +0000

cmonty14  writes:
> due to performance issues RGW is not an option.  This statement may be
> wrong, but there's the following aspect to consider.

> If I write a backup that is typically a large file, this is normally a
> single IO stream.
> This causes massive performance issues on Ceph because this single IO
> stream is sequentially written in small pieces on OSDs.
> To overcome this issue multi IO stream should be used when writing
> large files, and this means the application writing the backup must
> support multi IO stream.

RGW (and the S3 protocol in general) supports multi-stream uploads
nicely, via the "multipart upload" feature: You split your file into
many pieces, which can be uploaded in parallel.

RGW with multipart uploads seems like a good fit for your application.
It could solve your naming and permission issues, has low overhead, and
could give you good performance as long as you use multipart uploads
with parallel threads.  You just need to make sure that your RGW
gateways have enough throughput, but this capacity is relatively easy
and inexpensive to provide.

> Considering this the following question comes up: If I write a backup
> into a RBD (that could be considered as a network share), will Ceph
> use single IO stream or multi IO stream on storage side?

Ceph should be able to handle multiple parallel streams of I/O to an RBD
device (in general, writes will go to different "chunks" of the RBD, and
those chunk objects will be on different OSDs).  But it's another
question whether your RBD client will be able to issue parallel streams
of requests.  Usually you have some kind of file system and kernel block
I/O layer on the client side, and it's possible that those will
serialize I/O, which will make it hard to get high throughput.
-- 
Simon.

> THX

> Am Di., 22. Jan. 2019 um 23:20 Uhr schrieb Christian Wuerdig
> <christian.wuerdig@xxxxxxxxx>:
>> 
>> If you use librados directly it's up to you to ensure you can
>> identify your objects. Generally RADOS stores objects and not files
>> so when you provide your object ids you need to come up with a
>> convention so you can correctly identify them. If you need to
>> provide meta data (i.e. a list of all existing backups, when they
>> were taken etc.) then again you need to manage that yourself
>> (probably in dedicated meta-data objects). Using RADOS namespaces
>> (like one per database) is probably a good idea.
>> Also keep in mind that for example Bluestore has a maximum object
>> size of 4GB so mapping files 1:1 to object is probably not a wise
>> approach and you should breakup your files into smaller chunks when
>> storing them. There is libradosstriper which handles the striping of
>> large objects transparently but not sure if that has support for
>> RADOS namespaces.
>> 
>> Using RGW instead might be an easier route to go down
>> 
>> On Wed, 23 Jan 2019 at 10:10, cmonty14 <74cmonty@xxxxxxxxx> wrote:
>>> 
>>> My backup client is using librados.
>>> I understand that defining a pool for the same application is recommended.
>>> 
>>> However this would not answer my other questions:
>>> How can I identify a backup created by client A that I want to restore
>>> on another client Z?
>>> I mean typically client A would write a backup file identified by the
>>> filename.
>>> Would it be possible on client Z to identify this backup file by
>>> filename? If yes, how?
>>> 
>>> Am Di., 22. Jan. 2019 um 15:07 Uhr schrieb <ceph@xxxxxxxxxxxxxx>:
>>> >
>>> > Hi,
>>> >
>>> > Ceph's pool are meant to let you define specific engineering rules
>>> > and/or application (rbd, cephfs, rgw)
>>> > They are not designed to be created in a massive fashion (see pgs etc)
>>> > So, create a pool for each engineering ruleset, and store your data in them
>>> > For what is left of your project, I believe you have to implement that
>>> > on top of Ceph
>>> >
>>> > For instance, let say you simply create a pool, with a rbd volume in it
>>> > You then create a filesystem on that, and map it on some server
>>> > Finally, you can push your files on that mountpoint, using various
>>> > Linux's user, acl or whatever : beyond that point, there is nothing more
>>> > specific to Ceph, it is "just" a mounted filesystem
>>> >
>>> > Regards,
>>> >
>>> > On 01/22/2019 02:16 PM, cmonty14 wrote:
>>> > > Hi,
>>> > >
>>> > > my use case for Ceph is providing a central backup storage.
>>> > > This means I will backup multiple databases in Ceph storage cluster.
>>> > >
>>> > > This is my question:
>>> > > What is the best practice for creating pools & images?
>>> > > Should I create multiple pools, means one pool per database?
>>> > > Or should I create a single pool "backup" and use namespace when writing
>>> > > data in the pool?
>>> > >
>>> > > This is the security demand that should be considered:
>>> > > DB-owner A can only modify the files that belong to A; other files
>>> > > (owned by B, C or D) are accessible for A.
>>> > >
>>> > > And there's another issue:
>>> > > How can I identify a backup created by client A that I want to restore
>>> > > on another client Z?
>>> > > I mean typically client A would write a backup file identified by the
>>> > > filename.
>>> > > Would it be possible on client Z to identify this backup file by
>>> > > filename? If yes, how?
>>> > >
>>> > >
>>> > > THX
>>> > > _______________________________________________
>>> > > ceph-users mailing list
>>> > > ceph-users@xxxxxxxxxxxxxx
>>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> > >
>>> > _______________________________________________
>>> > ceph-users mailing list
>>> > ceph-users@xxxxxxxxxxxxxx
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com