On Tue, Jul 19, 2011 at 10:02 AM, Brian Chrisman <brchrisman@xxxxxxxxx> wrote: > On Tue, Jul 19, 2011 at 9:11 AM, Gregory Farnum > <gregory.farnum@xxxxxxxxxxxxx> wrote: >> RBD presently does absolutely no client-side caching of writes. If a >> write is issued, it goes out to the network and doesn't return from >> the write until it's been committed on-disk to the OSDs. >> This is the simplest and safest implementation, but it's not quite >> what many things expect from their disk devices: they think a write >> should be fast (and land in the disk's cache) and expect that a sync >> might be slow. If you run an actual OS on RBD you should find it >> performing reasonably well since the VFS and page cache will hide all >> this latency from you, but if you're accessing it directly your >> application is likely to issue a bunch of serialized 4KB writes with a >> per-write latency of ~10 milliseconds, which obviously severely limits >> total write bandwidth. >> Implementing client-side caching is in the future for RBD but not >> something we're working on right now. >> >> So most (all, I think) reports of slow RBD behavior have been because >> of this (certainly the one you reference). But with the commands >> you're running, you're doing dd in file mode rather than raw disk, so >> the VFS ought to be coalescing those writes and giving you much higher >> bandwidth. Are you using any special mount options? What result do you >> get doing a dd directly to the device? >> -Greg > > > So what would be the throughput-optimal block size for writing to RBD > with the default configuration? 4MB? > Well in general it's not a question of optimal block size so much as of not letting the latency impact your write pattern. Larger blocks like 4MB certainly help with that, as does doing multiple writes at a time and async IO. But running a filesystem on top of the disk will generally do all those things for you, so I'm not sure exactly what's going on here... -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html