Jason, thanks for the reply, you really got my question right. So, some doubts that might show that i lack of some general knowledge. When i read that someone is testing a ceph cluster with secuential 4k block writes, does that could happen inside a vm that is using an RBD backed OS ? In that case, should the vm's FS should be formated to allow 4K writes so that the block level of the vm writes 4K down to the hypervisor ? In that case, asuming that i have a 9K mtu between the compute node and the ceph cluster. What is the default rados block size in whitch the objects are divided against the amount of information ? On Mon, Mar 20, 2017 at 7:06 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > It's a very broad question -- are you trying to determine something > more specific? > > Notionally, your DB engine will safely journal the changes to disk, > commit the changes to the backing table structures, and prune the > journal. Your mileage my vary depending on the specific DB engine and > its configuration settings. > > The VM's OS will send write requests addressed by block offset and > block counts (e.g. 512 blocks) through the block device hardware > (either a slower emulated block device or a faster paravirtualized > block device like virtio-blk/virtio-scsi). Within the internals of > QEMU, these block-addressed write requests will be delivered to librbd > in byte-addressed format (the blocks are converted to absolute byte > ranges). > > librbd will take the provided byte offset and length and quickly > calculate which backing RADOS objects are associated with the provided > range [1]. If the extent intersects multiple backing objects, the > sub-operation is sent to each affected object in parallel. These > operations will be sent to the OSDs responsible for handling the > object (as per the CRUSH map) -- by default via TCP/IP. The MTU is the > maximum size of each IP packet -- larger MTUs allow you to send more > data within a single packet [2]. > > [1] http://docs.ceph.com/docs/master/architecture/#data-striping > [2] https://en.wikipedia.org/wiki/Maximum_transmission_unit > > > > On Mon, Mar 20, 2017 at 5:24 PM, Alejandro Comisario > <alejandro@xxxxxxxxxxx> wrote: >> anyone ? >> >> On Fri, Mar 17, 2017 at 5:40 PM, Alejandro Comisario >> <alejandro@xxxxxxxxxxx> wrote: >>> Hi, it's been a while since im using Ceph, and still im a little >>> ashamed that when certain situation happens, i dont have the knowledge >>> to explain or plan things. >>> >>> Basically what i dont know is, and i will do an exercise. >>> >>> EXCERCISE: >>> a virtual machine running on KVM has an extra block device where the >>> datafiles of a database runs (this block device is exposed to the vm >>> using libvirt) >>> >>> facts. >>> * the db writes to disk in 8K blocks >>> * the connection between the phisical compute node and Ceph has an MTU of 1500 >>> * the QEMU RBD driver uses a stipe unit of 2048 kB and a stripe count of 4. >>> * everything else is default >>> >>> So conceptually, if someone can explain me, what happens from the >>> momment the DB contained on the VM commits to disk a query of >>> 20MBytes, what happens on the compute node, what happens on the >>> client's file striping, what happens on the network (regarding >>> packages, if other than creating 1500 bytes packages), what happens >>> with rados objects, block sizes, etc. >>> >>> I would love to read this from the bests, mainly because as i said i >>> dont understand all the workflow of blocks, objects, etc. >>> >>> thanks to everyone ! >>> >>> -- >>> Alejandrito >> >> >> >> -- >> Alejandro Comisario >> CTO | NUBELIU >> E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857 >> _ >> www.nubeliu.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Jason -- Alejandro Comisario CTO | NUBELIU E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857 _ www.nubeliu.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com