Hi Haomai;
Do you use filestore_fiemap=true parameter over CentOS7+ Hammer/Interfalis on any Production Ceph environment for rbd style storage? Is it safe to use on production environment?
Thanks
Özhan
On Wed, Nov 18, 2015 at 8:12 AM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:
Yes, it's a expected case. Actually if you use Hammer, you can enable
filestore_fiemap to use sparse copy which especially useful for rbd
snapshot copy. But keep in mind some old kernel are *broken* in
fiemap. CentOS 7 is only the distro I verfied fine to this feature.
On Wed, Nov 18, 2015 at 12:25 PM, Will Bryant <will.bryant@xxxxxxxxx> wrote:
> Hi,
>
> We’ve been running an all-SSD Ceph cluster for a few months now and generally are very happy with it.
>
> However, we’ve noticed that if we create a snapshot of an RBD device, then writing to the RBD goes massively slower than before we took the snapshot. Similarly, we get poor performance if we make a clone of that snapshot and write to it.
>
> For example, using fio to run a 2-worker 4kb synchronous random write benchmark, we normally get about 5000 IOPS to RBD on our test-sized cluster (Intel 3710, 10G networking, Ubuntu 14.04). But as soon as I take a snapshot, this goes down to about 100 IOPS, and with high variability - at times 0 IOPS, 60 IOPS, or 300 IOPS.
>
> I realise that after a snapshot, any write will trigger a copy of the block, which by default would be 4 MB of data - to minimize this effect I’ve reduced the RBD order to 18 ie. 256 KB blocks.
>
> But shouldn’t that effect only degrade it to the same performance as we get on a completely new RBD image that has no snapshots and no data? For us that is more like 1000-1500 IOPS ie. still at least 10x better than the performance we get after a snapshot is taken.
>
> Is there something particularly inefficient about the copy-on-write block implementation that makes it much worse than writing to fresh blocks? Note that we get this performance drop even if the other data on the blocks are cached in memory, and since we’re using fast SSDs, the time to read in the rest of the 256 KB should be negligible.
>
> We’re currently using Hammer but we also tested with Infernalis and it didn’t seem any better.
>
> Cheers,
> Will
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Best Regards,
Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com