On Wed, Aug 28, 2013 at 9:56 PM, Kevin Frey <kevin.frey@xxxxxxxxxxxxxxxx> wrote: > Hello All, > > This is my first post to the list, and my question is very general to > encourage discussion (perhaps derision). > > I am the team-leader involved with the development of an application of > which one common capability is a “document management” like facility that > permits a user to “attach” (in a logical sense) files to given data records > in the database. The kinds of files being attached would be Microsoft Word > documents, spreadsheets, PDF files, and so on. For various reasons we don’t > store these files in the SQL database but instead in an associated file > store. > > This file store is not very large at present (a few hundred Gb perhaps) and > the storage methodology is fairly naïve in the sense it stores onto a single > volume using RAID as the “durability” component. > > Our file store effectively does two things for us however: act as a basic > store for the files; provides the feed for a full-text indexing system so > that the files can be searched on. > > Ceph is a product I’ve been following for a while now and I have no question > it can handle the storage aspect, but my question relates to how I would > achieve my second requirement? > > From what I’ve read, CephFS is not “production ready” but one obvious > strategy to achieve a searchable database would be to just expose the whole > directory namespace (or a single pool?) via CephFS and point a standard file > indexing product at it. Would this work? Would it be a good or a bad idea > (ignoring the status of CephFS itself). > > Or would you suggest something that is perhaps more tightly integrated with > the RADOS object store? I need to be wary of not putting myself in the > position of having to write an entire file indexing suite also in terms of > weighing up design possibilities. Hmm. This is a difficult question since you're basically asking how to stick an index on top of an object store. :) Ignoring its readiness, CephFS is a pretty normal filesystem so it should work fine under your indexing product. Depending on your pipeline it might also be appropriate to write from your app directly into the RADOS object store via librados, and when you do that send a copy of the object to some indexer in order to not have to do scans (or just queue up the object for indexing, or whatever). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html