On 21/07/2009 00:28, Drew Morris wrote:
Hi All... We are developing a custom translator to log modifications to files (including creation, update and deletion)
mtime attribute?
into database.
Have you looked into SeznamFS?
*Our Current Approach:* By reviewing the Gluster and FUSE source code and documentation, we concluded that the following FOPs should be monitored for this purpose: open, create, mknod, truncate, ftruncate, writev, flush, release, unlink and rename.
You should really look into SeznamFS.
We would like to insert one record per each file modification, hence we need a mechanism to aggregate multiple operations such as open, writev and flush over one file-descriptor into a single update. For performance sake and preventing dirty reads, we would like to do a database row insertion in the callback of the very last action that is performed. By other means, during write we just set flags as modified in file descriptor context and perform the insert in the very last action. The major issue is that (as most of the docs and FAQ indicated) there is no reliable mechanism to decide which FOP action is the last one.
If I'm following what you are saying, that's not sensibly doable because you never know if there will be another operation. You have to treat each op as the last one, because you don't know what happens next. So you'll have to log all of them, and if you only ever want one of them, key them by file path hash in your DB so that each op overwrites the previous log. But if you're doing that, you might as well just to a recursive scan for mtime to see what's changed and take it from there.
We monitored file system interaction via trace module and noticed that the flush action is called several times and release is never invoked in many cases.
Bug?
This issue forced us to log the very first flush which is quite problematic for a number of reasons including the fact that we can never be sure the operation is finished before triggering any of our asynchronous operations and we are slowing down the initial write because we are waiting on the log action to complete.
Have you tried it using a dummy FS, rather than piggybacking on GlusterFS? If so, did you observe the same flush/release behaviour?
*Question:* Does anyone have a better solution for this issue? Perhaps there should be a mechanism to notify us of the closing of a file, otherwise an open file descriptor will remain forever. We would really love to find any other reliable method that allows us to track these operations at a higher level. We would greatly appreciate any new approach that can overcome these deficiencies.
Other than SeznamFS which I mentioned above, perhaps CopyFS might give you a better base to work on? The sort of thing you are describing doesn't strike me as a major use-case for GlusterFS.
Gordan