On Wed, Dec 26, 2012 at 6:16 PM, lollipop <lollipop_jin@xxxxxxx> wrote: > Nowadays, I am wondering doing offline deduplication in ceph? > My idea is: > First in the ceph-client, I try to get the locations of chunks in one file. > The information includes > how many chunks the file has and which osd the chunk(object group) has been > stored. > Then the ceph-client try to communicate with the exact osd to ask the osd to > return the chunk hash. > After that, we compare the returned hash with the already stored hash table, > If the chunk is duplicated, we try to change the file meta-data. > Can it work? > Can you give some ideas? Thank you Any off-line deduplication support in Ceph is going to have the important parts be in the OSD code, not in the clients. :) We've discussed dedup a little bit internally and on the mailing list; you can do an archive search if you're interested in what the current thoughts are. :) -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html