On 02/15/2013 12:00 PM, Gregory Farnum wrote: > That's a lot more of a slowdown than I'd expect to see, but there isn't much hint about where the slow-down is actually happening. I don't recall precisely what's happening in the kernel client when you do a bunch of mmaps — Sage? Does it require a network round-trip when you do that or will it cache and pre-read appropriately? > > But more generally you'll need to describe your workload pattern a > bit more, and do some benchmarks at lower layers of the stack to see > what kind of bandwidth is available to begin with. Look at the rados > bench stuff to get some data on disk and then do a bunch of > simultaneous read benchmarks to see how fast your OSDs can serve data > up under a fairly reasonable streaming workload; check out > smalliobenchrados to do some IO that more closely mimics your > application, etc. This *was* the benchmark. Each host has the data on local hard drive, that is the kind of bandwidth that's available to begin with. All other things are equal, - baseline test mmaps from /dev/sdb1 mounted as ext4, - ceph test mmaps from /dev/sdb1 mounted as cephfs -> (hopefully local) osd. The slowdown is an order of magnitude and change. As for workload pattern: Last I looked at mmap (been a while) it'd place data in shared memory and COW it if needed. Since the application is now writing, that's never needed -- so cephfs's mounted read-only. There should not be any caches to invalidate or resync. The application is (was last I looked) able to search in only 2GB at a time so the search space is split into 2GB files. Each of the worker hosts has 4GB RAM/core so each running instance should be able to get through at least one 2GB chunk without thrashing (i.e. sequential read of 2GB file and no disk i/o until it's done with that and needs to read the next one). (In fact, the fastest way to fly is throw enough RAM at the host to have all of the search data in RAM all the time.) I/o-wise the worst part is the start of the batch where all instances start reading the 1st file at the 1st byte. After that it starts to spread out as they're going through the search space at different rates (due to diffs in their search targets). The good news, if you call it that, is that ceph didn't keel over during that initial spike. (But that's only 16 parallel jobs; our very small cluster can do only 62 ATM.) The bad news is 3 jobs/hour sounds like what I can probably get by placing the search data on nfs and having all 16 jobs hit the single nfs server. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com