Re: RBD snapshots cause disproportionate performance degradation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, it's a expected case. Actually if you use Hammer, you can enable
filestore_fiemap to use sparse copy which especially useful for rbd
snapshot copy. But keep in mind some old kernel are *broken* in
fiemap. CentOS 7 is only the distro I verfied fine to this feature.


On Wed, Nov 18, 2015 at 12:25 PM, Will Bryant <will.bryant@xxxxxxxxx> wrote:
> Hi,
>
> We’ve been running an all-SSD Ceph cluster for a few months now and generally are very happy with it.
>
> However, we’ve noticed that if we create a snapshot of an RBD device, then writing to the RBD goes massively slower than before we took the snapshot.  Similarly, we get poor performance if we make a clone of that snapshot and write to it.
>
> For example, using fio to run a 2-worker 4kb synchronous random write benchmark, we normally get about 5000 IOPS to RBD on our test-sized cluster (Intel 3710, 10G networking, Ubuntu 14.04).  But as soon as I take a snapshot, this goes down to about 100 IOPS, and with high variability - at times 0 IOPS, 60 IOPS, or 300 IOPS.
>
> I realise that after a snapshot, any write will trigger a copy of the block, which by default would be 4 MB of data - to minimize this effect I’ve reduced the RBD order to 18 ie. 256 KB blocks.
>
> But shouldn’t that effect only degrade it to the same performance as we get on a completely new RBD image that has no snapshots and no data?  For us that is more like 1000-1500 IOPS ie. still at least 10x better than the performance we get after a snapshot is taken.
>
> Is there something particularly inefficient about the copy-on-write block implementation that makes it much worse than writing to fresh blocks?  Note that we get this performance drop even if the other data on the blocks are cached in memory, and since we’re using fast SSDs, the time to read in the rest of the 256 KB should be negligible.
>
> We’re currently using Hammer but we also tested with Infernalis and it didn’t seem any better.
>
> Cheers,
> Will
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux