[snip] > > 1. Can the bitd be one per node like self-heal-daemon and other "global" > services? I worry about creating 2 * N processes for N bricks in a node. > Maybe we can consider having one thread per volume/brick etc. in a single > bitd process to make it perform better. Absolutely. There would be one bitrot daemon per node, per volume. > > 2. It would be good to consider throttling for filesystem scan and update of > checksums. That way we can avoid overwhelming the system after enabling > bitrot on pre-created data. Makes sense. Filesystem scan based on xtime is planned to be integrated in libgfchangelog and exposed via an API. Throttling would be one of the tunable to control scan speed. > > 3. I think the algorithm for checksum computation can vary within the > volume. I see a reference to "Hashtype is persisted along side the checksum > and can be tuned per file type." Is this correct? If so: > > a) How will the policy be exposed to the user? Bitrot daemon would have a configuration file that can be configured via Gluster CLI. Tuning hash types could be based on file types or file name patterns (regexes) [which is a bit tricky as bitrot would work on GFIDs rather than filenames, but this can be solved by a level of indirection]. > > b) It would be nice to have the algorithm for computing checksums be > pluggable. Are there any thoughts on pluggability? Do you mean the default hash algorithm be configurable? If yes, then that's planned. > > c) What are the steps involved in changing the hashtype/algorithm for a > file? Policy changes for file {types, patterns} are lazy, i.e., taken into effect during the next recompute. For objects that are never modified (after initial checksum compute), scrubbing can recompute the checksum using the new hash _after_ verifying the integrity of a file with the old hash. > > 4. Is the fop on which change detection gets triggered configurable? As of now all data modification fops trigger checksum calculation. > > 5. It would be good to have the store & retrieval of checksums modular so > that we can choose an alternate backend in the future (apart from extended > attributes) if necessary. Yes. That too would be pluggable with xattr based store as the default. store/retrieve apis would be generic enough for pluggability. > > 6. Any thoughts on integrating the bitrot repair framework with self-heal? There are some thoughts on integration with self-heal daemon and EC. I'm coming up with a doc which covers those [reason for delay in replying to your questions ;)]. Expect the doc in in gluster-devel@ soon. > > 7. How does detection figure out that lazy updation is still pending and not > raise a false positive? That's one of the things that myself and Rachana discussed yesterday. Should scrubbing *wait* till checksum updating is still in progress or is it expected that scrubbing happens when there is no active I/O operations on the volume (both of which imply that bitrot daemon needs to know when it's done it's job). If both scrub and checksum updating go in parallel, then there needs to be way to synchronize those operations. Maybe, compute checksum on priority which is provided by the scrub process as a hint (that leaves little window for rot though) ? Any thoughts? > > Regards, > Vijay > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel