> I think this is where I see slow performance. If you are doing large IO, then > copying 4MB objects (assuming defaults) is maybe only 2x times the original > IO to the disk. However if you are doing smaller IO from what I can see a > single 4kb write would lead to a 4MB object being copied to the snapshot, > with 3x replication, this could be amplification in the thousands. Is my > understanding correct here, it's certainly what I see? The first write (4K or otherwise) to a recently snapshotted object will result in CoW to a new clone of the snapshotted object. Subsequent writes to the same object will not have the same penalty. In the parent/child image case, the first write to the child would also result in a full object CoW from the parent to the child. > >With RBD layering, you do whole-object copy-on-write from the client. > > Doing it from the client does let you put "child" images inside of a faster > > pool, > > yes. But creating new objects doesn't make the *old* ones slow, so why do > > you think there's still the same problem? (Other than "the pool is faster" > > being perhaps too optimistic about the improvement you'd get under this > > workload.) > > From reading the RBD layering docs it looked like you could also specify a > different object size for the target. If there was some way that the > snapshot could have a different object size or some sort of dirty bitmap, > then this would reduce the amount of data that would have to copied on each > write. Have you tried using a different object size for your RBD image? I think your proposal is effectively the same as just reducing the object size (with the added overhead of a OSD<->client round-trip for CoW instead of handling it within the OSD directly). The default 4MB object size was an attempt to strike a balance between the CoW cost and the number of objects the OSDs would have to manage. > > What I meant about it slowing down the pool, is due to the extra 4MB copy > writes, the max small IO you can do is dramatically reduced, as each small > IO is now a 4MB IO. By shifting the COW to a different pool you could reduce > the load on the primary pool and effect on primary workloads. You are > effectively shifting this snapshot "tax" onto an isolated set of > disks/SSD's. Except eventually all your IO will be against the new "fast" pool as enough snapshotted objects have been CoW over to the new pool? > To give it some context in what I am trying to achieve here is the > background. We are currently migrating our OLB service from LVM thinpools to > Ceph. As part of the service we offer, we take regular archive backups to > tape and also offer DR tests. Both of these require snapshots to allow the > normal OLB backups to continue uninterrupted and for these snapshots to > potentially be left open for several days at time. As its OLB, as you can > imagine, there is a lot of write IO. > > Currently with LVM, although there is a slight performance hit, the block > size in LVM roughly matches the average IO size (128-512KB) and so the COW > process doesn't cause much overhead. When I did some quick FIO tests with > Ceph it seemed to have a much greater knock on effect when using 4MB object > RBD's. > > We can probably work around this by having a cluster with more disks, or > reducing the RBD object size, but I thought it was worth asking in case > there was any other way round it, > > Nick > > >There's definitely nothing integrated into the Ceph codebase > > about internal layering, or a way to redirect snapshots outside of the OSD, > > though you could always experiment with flashcache et al. > > -Greg > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason Dillaman _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com