hi WeiDong, In the source code, I found xlators/cluster/map/, is this what you are looking for? On Fri, Aug 21, 2009 at 12:15 AM, Wei Dong<wdong.pku at gmail.com> wrote: > Hi All, > > We are using glusterfs on our lab cluster for a shared storage to save a > large number of image files, about 30 million at the moment. ?We use Hadoop > for distributed computing, but we are reluctant to store small files on > hadoop for it's low throughput on small files and also the non-standard > filesystem interface (e.g. we won't be able to run convert on each image to > produce a thumbnail if the files are stored in hadoop). ?What we do now is > to store a list of paths to all images in hadoop, and use Hadoop streaming > to pipe the paths to some script, which will then read the images from > glusterfs filesystem and do the processing. ?This has been working for a > while so long as glusterfs doesn't hang, but the problem is that we > basically lose all data locality. ?We have 66 nodes and the chance that a > needed file is on local disk is only 1/66, and 55/66 of file I/O has to go > through network, which make me very uncomfortable. ?I'm wondering if there's > a better way of making glusterfs and Hadoop work together to take the > advantage of data locality. > > I know that there's a nufa translator which gives high preference to local > drive. ?This is good enough if the assignment of files to nodes is fixed. > ?But if we want to assign files to nodes according to the location of the > file, what interface should we use to get the physical location of the file? > > I appreciate all your suggestions. > > - Wei Dong > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >