On Tue, Dec 9, 2014 at 1:41 PM, Deepak Shetty <dpkshetty@xxxxxxxxx> wrote: > We can use bitrot to provide a 'health' status for gluster volumes. > Hence I would like to propose (from a upstream/community perspective) the > notion of 'health' status (as part of gluster volume info) which can derive > its value from: > > 1) Bitrot > If any files are corrupted and bitrot is yet to repair them and/or its a > signal to admin to do some manual operation to repair the corrupted files > (for cases where we only detect, not correct) > > 2) brick status > Depending on brick offline/online > > 3) AFR status > Whether we have all copies in sync or not This makes sense. Having a notion of "volume health" helps choosing intelligently from a list of volumes. > > This i believe is on similar lines to what Ceph does today (health status : > OK, WARN, ERROR) Yes, Ceph derives those notions from PGs. Gluster can do it for replicas and/or files marked by bitrot scrubber. > The health status derivation can be pluggable, so that in future more > components can be added to query for the composite health status of the > gluster volume. > > In all of the above cases, as long as data can be served by the gluster > volume reliably gluster volume status will be Started/Available, but Health > status can be 'degraded' or 'warn' WARN may be too strict, but something lenient enough yes descriptive should be chosen. Ceph does it pretty well: http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/ > > This has many uses: > > 1) It helps provide indication to the admin that something is amiss and he > can check based on: > bitrot / scrub status > brick status > AFR status > > and take necessary action > > 2) It helps mgmt applns (openstack for eg) make an intelligent decision > based on the health status (whether or not to pick this gluster volume for > this create volume operation), so it helps acts a a coarse level filter > > 3) In general it gives user an idea of the health of the volume (which is > different than the availability status (whether or not volume can serve > data)) > For eg: If we have a pure DHT volume, and bitrot detects silent file > corruption (and we are not auto correcting) having Gluster volume status as > available/started isn't entirely correct ! +1 > > thanx, > deepak > > > On Fri, Dec 5, 2014 at 11:31 PM, Venky Shankar <yknev.shankar@xxxxxxxxx> > wrote: >> >> On Fri, Nov 28, 2014 at 10:00 PM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote: >> > On 11/28/2014 08:30 AM, Venky Shankar wrote: >> >> >> >> [snip] >> >>> >> >>> >> >>> 1. Can the bitd be one per node like self-heal-daemon and other >> >>> "global" >> >>> services? I worry about creating 2 * N processes for N bricks in a >> >>> node. >> >>> Maybe we can consider having one thread per volume/brick etc. in a >> >>> single >> >>> bitd process to make it perform better. >> >> >> >> >> >> Absolutely. >> >> There would be one bitrot daemon per node, per volume. >> >> >> > >> > Do you foresee any problems in having one daemon per node for all >> > volumes? >> >> Not technically :). Probably that's a nice thing to do. >> >> > >> >> >> >>> >> >>> 3. I think the algorithm for checksum computation can vary within the >> >>> volume. I see a reference to "Hashtype is persisted along side the >> >>> checksum >> >>> and can be tuned per file type." Is this correct? If so: >> >>> >> >>> a) How will the policy be exposed to the user? >> >> >> >> >> >> Bitrot daemon would have a configuration file that can be configured >> >> via Gluster CLI. Tuning hash types could be based on file types or >> >> file name patterns (regexes) [which is a bit tricky as bitrot would >> >> work on GFIDs rather than filenames, but this can be solved by a level >> >> of indirection]. >> >> >> >>> >> >>> b) It would be nice to have the algorithm for computing checksums be >> >>> pluggable. Are there any thoughts on pluggability? >> >> >> >> >> >> Do you mean the default hash algorithm be configurable? If yes, then >> >> that's planned. >> > >> > >> > Sounds good. >> > >> >> >> >>> >> >>> c) What are the steps involved in changing the hashtype/algorithm for >> >>> a >> >>> file? >> >> >> >> >> >> Policy changes for file {types, patterns} are lazy, i.e., taken into >> >> effect during the next recompute. For objects that are never modified >> >> (after initial checksum compute), scrubbing can recompute the checksum >> >> using the new hash _after_ verifying the integrity of a file with the >> >> old hash. >> > >> > >> >> >> >>> >> >>> 4. Is the fop on which change detection gets triggered configurable? >> >> >> >> >> >> As of now all data modification fops trigger checksum calculation. >> >> >> > >> > Wish I was more clear on this in my OP. Is the fop on which checksum >> > verification/bitrot detection happens configurable? The feature page >> > talks >> > about "open" being a trigger point for this. Users might want to trigger >> > detection on a "read" operation and not on open. It would be good to >> > provide >> > this flexibility. >> >> Ah! ok. As of now it's mostly open() and read(). Inline verification >> would be "off" by default due to obvious reasons. >> >> > >> >> >> >>> >> >>> 6. Any thoughts on integrating the bitrot repair framework with >> >>> self-heal? >> >> >> >> >> >> There are some thoughts on integration with self-heal daemon and EC. >> >> I'm coming up with a doc which covers those [reason for delay in >> >> replying to your questions ;)]. Expect the doc in in gluster-devel@ >> >> soon. >> > >> > >> > Will look forward to this. >> > >> >> >> >>> >> >>> 7. How does detection figure out that lazy updation is still pending >> >>> and >> >>> not >> >>> raise a false positive? >> >> >> >> >> >> That's one of the things that myself and Rachana discussed yesterday. >> >> Should scrubbing *wait* till checksum updating is still in progress or >> >> is it expected that scrubbing happens when there is no active I/O >> >> operations on the volume (both of which imply that bitrot daemon needs >> >> to know when it's done it's job). >> >> >> >> If both scrub and checksum updating go in parallel, then there needs >> >> to be way to synchronize those operations. Maybe, compute checksum on >> >> priority which is provided by the scrub process as a hint (that leaves >> >> little window for rot though) ? >> >> >> >> Any thoughts? >> > >> > >> > Waiting for no active I/O in the volume might be a difficult condition >> > to >> > reach in some deployments. >> > >> > Some form of waiting is necessary to prevent false positives. One >> > possibility might be to mark an object as dirty till checksum updation >> > is >> > complete. Verification/scrub can then be skipped for dirty objects. >> >> Makes sense. Thanks! >> >> > >> > -Vijay >> > >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@xxxxxxxxxxx >> http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel