Re: question about block sizes, rados objects and file striping (and maybe more)

Alejandro Comisario <alejandro@xxxxxxxxxxx> · Mon, 20 Mar 2017 19:49:42 -0300

Jason, thanks for the reply, you really got my question right.
So, some doubts that might show that i lack of some general knowledge.

When i read that someone is testing a ceph cluster with secuential 4k
block writes, does that could happen inside a vm that is using an RBD
backed OS ?
In that case, should the vm's FS should be formated to allow 4K writes
 so that the block level of the vm writes 4K down to the hypervisor ?

In that case, asuming that i have a 9K mtu between the compute node
and the ceph cluster.
What is the default rados block size in whitch the objects are divided
against the amount of information ?

On Mon, Mar 20, 2017 at 7:06 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> It's a very broad question -- are you trying to determine something
> more specific?
>
> Notionally, your DB engine will safely journal the changes to disk,
> commit the changes to the backing table structures, and prune the
> journal. Your mileage my vary depending on the specific DB engine and
> its configuration settings.
>
> The VM's OS will send write requests addressed by block offset and
> block counts (e.g. 512 blocks) through the block device hardware
> (either a slower emulated block device or a faster paravirtualized
> block device like virtio-blk/virtio-scsi). Within the internals of
> QEMU, these block-addressed write requests will be delivered to librbd
> in byte-addressed format (the blocks are converted to absolute byte
> ranges).
>
> librbd will take the provided byte offset and length and quickly
> calculate which backing RADOS objects are associated with the provided
> range [1]. If the extent intersects multiple backing objects, the
> sub-operation is sent to each affected object in parallel. These
> operations will be sent to the OSDs responsible for handling the
> object (as per the CRUSH map) -- by default via TCP/IP. The MTU is the
> maximum size of each IP packet -- larger MTUs allow you to send more
> data within a single packet [2].
>
> [1] http://docs.ceph.com/docs/master/architecture/#data-striping
> [2] https://en.wikipedia.org/wiki/Maximum_transmission_unit
>
>
>
> On Mon, Mar 20, 2017 at 5:24 PM, Alejandro Comisario
> <alejandro@xxxxxxxxxxx> wrote:
>> anyone ?
>>
>> On Fri, Mar 17, 2017 at 5:40 PM, Alejandro Comisario
>> <alejandro@xxxxxxxxxxx> wrote:
>>> Hi, it's been a while since im using Ceph, and still im a little
>>> ashamed that when certain situation happens, i dont have the knowledge
>>> to explain or plan things.
>>>
>>> Basically what i dont know is, and i will do an exercise.
>>>
>>> EXCERCISE:
>>> a virtual machine running on KVM has an extra block device where the
>>> datafiles of a database runs (this block device is exposed to the vm
>>> using libvirt)
>>>
>>> facts.
>>> * the db writes to disk in 8K blocks
>>> * the connection between the phisical compute node and Ceph has an MTU of 1500
>>> * the QEMU RBD driver uses a stipe unit of 2048 kB and a stripe count of 4.
>>> * everything else is default
>>>
>>> So conceptually, if someone can explain me, what happens from the
>>> momment the DB contained on the VM commits to disk a query of
>>> 20MBytes, what happens on the compute node, what happens on the
>>> client's file striping, what happens on the network (regarding
>>> packages, if other than creating 1500 bytes packages), what happens
>>> with rados objects, block sizes, etc.
>>>
>>> I would love to read this from the bests, mainly because as i said i
>>> dont understand all the workflow of blocks, objects, etc.
>>>
>>> thanks to everyone !
>>>
>>> --
>>> Alejandrito
>>
>>
>>
>> --
>> Alejandro Comisario
>> CTO | NUBELIU
>> E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857
>> _
>> www.nubeliu.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason

-- 
Alejandro Comisario
CTO | NUBELIU
E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857
_
www.nubeliu.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com