On Mon, Mar 20, 2017 at 6:49 PM, Alejandro Comisario <alejandro@xxxxxxxxxxx> wrote: > Jason, thanks for the reply, you really got my question right. > So, some doubts that might show that i lack of some general knowledge. > > When i read that someone is testing a ceph cluster with secuential 4k > block writes, does that could happen inside a vm that is using an RBD > backed OS ? You can use some benchmarks directly against librbd (e.g. see fio's rbd engine), some within a VM against an RBD-backed block device, and some within a VM against a filesystem backed by an RBD-backed block device. > In that case, should the vm's FS should be formated to allow 4K writes > so that the block level of the vm writes 4K down to the hypervisor ? > > In that case, asuming that i have a 9K mtu between the compute node > and the ceph cluster. > What is the default rados block size in whitch the objects are divided > against the amount of information ? MTU size (network maximum packet size) and the RBD block object size are not interrelated. > > On Mon, Mar 20, 2017 at 7:06 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: >> It's a very broad question -- are you trying to determine something >> more specific? >> >> Notionally, your DB engine will safely journal the changes to disk, >> commit the changes to the backing table structures, and prune the >> journal. Your mileage my vary depending on the specific DB engine and >> its configuration settings. >> >> The VM's OS will send write requests addressed by block offset and >> block counts (e.g. 512 blocks) through the block device hardware >> (either a slower emulated block device or a faster paravirtualized >> block device like virtio-blk/virtio-scsi). Within the internals of >> QEMU, these block-addressed write requests will be delivered to librbd >> in byte-addressed format (the blocks are converted to absolute byte >> ranges). >> >> librbd will take the provided byte offset and length and quickly >> calculate which backing RADOS objects are associated with the provided >> range [1]. If the extent intersects multiple backing objects, the >> sub-operation is sent to each affected object in parallel. These >> operations will be sent to the OSDs responsible for handling the >> object (as per the CRUSH map) -- by default via TCP/IP. The MTU is the >> maximum size of each IP packet -- larger MTUs allow you to send more >> data within a single packet [2]. >> >> [1] http://docs.ceph.com/docs/master/architecture/#data-striping >> [2] https://en.wikipedia.org/wiki/Maximum_transmission_unit >> >> >> >> On Mon, Mar 20, 2017 at 5:24 PM, Alejandro Comisario >> <alejandro@xxxxxxxxxxx> wrote: >>> anyone ? >>> >>> On Fri, Mar 17, 2017 at 5:40 PM, Alejandro Comisario >>> <alejandro@xxxxxxxxxxx> wrote: >>>> Hi, it's been a while since im using Ceph, and still im a little >>>> ashamed that when certain situation happens, i dont have the knowledge >>>> to explain or plan things. >>>> >>>> Basically what i dont know is, and i will do an exercise. >>>> >>>> EXCERCISE: >>>> a virtual machine running on KVM has an extra block device where the >>>> datafiles of a database runs (this block device is exposed to the vm >>>> using libvirt) >>>> >>>> facts. >>>> * the db writes to disk in 8K blocks >>>> * the connection between the phisical compute node and Ceph has an MTU of 1500 >>>> * the QEMU RBD driver uses a stipe unit of 2048 kB and a stripe count of 4. >>>> * everything else is default >>>> >>>> So conceptually, if someone can explain me, what happens from the >>>> momment the DB contained on the VM commits to disk a query of >>>> 20MBytes, what happens on the compute node, what happens on the >>>> client's file striping, what happens on the network (regarding >>>> packages, if other than creating 1500 bytes packages), what happens >>>> with rados objects, block sizes, etc. >>>> >>>> I would love to read this from the bests, mainly because as i said i >>>> dont understand all the workflow of blocks, objects, etc. >>>> >>>> thanks to everyone ! >>>> >>>> -- >>>> Alejandrito >>> >>> >>> >>> -- >>> Alejandro Comisario >>> CTO | NUBELIU >>> E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857 >>> _ >>> www.nubeliu.com >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> -- >> Jason > > > > -- > Alejandro Comisario > CTO | NUBELIU > E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857 > _ > www.nubeliu.com -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com