On Fri, Feb 15, 2013 at 6:18 PM, Dimitri Maziuk <dmaziuk@xxxxxxxxxxxxx> wrote: > On 02/15/2013 06:09 PM, Sam Lang wrote: > >> Your best bet is probably to add some location awareness to your job >> scheduling so that a job runs on an osd where the file data is >> located. You can access the location of a file with: >> >> cephfs <file> map >> >> If you want an entire 2GB file to end up on the same osd (sounds like >> you do), you can set the layout of your files (or set the layout of a >> parent directory and create files in it) with: >> >> cephfs <file> set_layout -s $[1024*1024*1024*2] >> >> Before doing that though, you might want to test out the performance >> of a job on a ceph setup with only one osd (or all osds on the same >> node). That will potentially tell you if your network is a >> significant bottleneck. > > Will do, thanks, however, > > I have 30-odd GB (and growing) of search date split into 2GB files and > each job reads through all of them. So what I do now with rsync and want > to replicate with ceph is a full mirror of everything on every host > (osd). Can I get ceph to do that? You can, but ceph always performs reads from the primary osd, so you will still need to use the set_layout and map commands mentioned above to run your jobs on the right nodes. > > (I was trying to get there with pool size = # osds, min_size = # osds > and crush map with uniform algorithm & equal weight of 1 for each host.) pool size sets the size of replicas desired, and should be thought of as the number of replicas that will exist in a clean, steady state (without constant failures causing degraded pgs). min_size sets the size of replicas required for I/O to succeed. -sam > > Thanks again > -- > Dimitri Maziuk > Programmer/sysadmin > BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com