Looking for feedback / corrections to my logic for trying to preemptively scan GlusterFS storage for inconsistent file conditions that may prevent access from GlusterFS clients. This is only targeted to the 3.1.x software releases, though may be applicable to earlier and later versions. Steps to troubleshoot client access problems (reorder when a reasonable process has been nailed down) Check client for: is gluster-client service running? are GlusterFS mount points present and accessible can other files in the same directory be accessed? do file permissions as presented to the client prohibit client from performing whatever access is being attempted? does lsof show the file, or the directory path to the file in use? Check backend storage servers for: file presence on one pair of mirrors, and that if the file exists on the other pair of mirrors it is a GlusterFS symlink (perm 0000) file permissions are consistent across all bricks file attributes are consistent for all occurrences of the file (unless file is a GlusterFS symlink) Logic for storage server check: 1) find all files with permissions of 0000 (GlusterFS symlink) 2) check extended attribute of each file 3) if attribute is trusted.glusterfs.dht.linkto, lookup indicated bricks and servers contained in that replica set (e.g pfs-ro1-replicate-11\000) 4) check to see that actual (normal) file exists on both of those bricks (e.g, pfs-ro1-client-22 and pfs-ro1-client-23). 5) if file does NOT exist, log to error file Possible auto correction steps: 1) if error file exists, process it by removing Gluster LINK files, then 2) copy error file to Gluster client node 3) on that client node, copy missing files from source to native gluster mount This is the type of tool that would not only be helpful to administrators, but would increase confidence in the state of the GlusterFS storage system. I believe that the development team is working to incorporate some of this type of functionality in newer releases of the GlusterFS software - but some of us are stuck running what we are currently running until we can make compelling arguments for the stability of newer releases. James Burnash Unix Engineer Knight Capital Group DISCLAIMER: This e-mail, and any attachments thereto, is intended only for use by the addressee(s)named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com<http://www.knight.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20110621/3446568b/attachment.htm>