Re: Troubleshooting and Diagnostic tools for Gluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




regards
Aravinda

On 10/23/2015 11:50 PM, Shyam wrote:


On 10/23/2015 06:46 AM, Aravinda wrote:
Hi Gluster developers,

In this mail I am proposing troubleshooting documentation and
Gluster Tools infrastructure.

Tool to search in documentation
===============================
We recently added message Ids to each error messages in Gluster. Some
of the error messages are self explanatory. But some error messages
requires manual intervention to fix the issue. How about identifying
the error messages which requires more explanation and creating
documentation for the same. Even though the information about some
errors available in documentation, it is very difficult to search and
relate to the error message. It will be very useful if we create a
tool which looks for documentation and tells us exactly what to do.

For example,(Illustrativepurpose only)
glusterdoc --explain GEOREP0003

     SSH configuration issue. This error is seen when Pem keys from all
     master nodes are not distributed properly to Slave
     nodes. Use Geo-replication create command with force option to
     redistribute the keys. If issue stillpersists, look for any errors
     while running hook scripts inGlusterd log file.


Note: Inspired from rustc --explain command
https://twitter.com/jaredforsyth/status/626960244707606528

If we don't know the message id, we can still search from the
available documentation like,

     glusterdoc --search <SEARCH_KEY_WORD>

These commands can be programmatically consumed, for example
`--json` will return the output in JSON format. This enables UI
developers to automatically show help messages when they display
errors.

The message ID based logging was created for this exact purpose (maybe not so elegant a purpose, but leaning towards this :) ). So I am all for it.

(suggestion) The intention of documenting messages with text in DOxygen format, was to be able extract this information from the headers, and create a catalog, that can then be searched etc. This catalog can be processed and shipped as part of the gluster RPMs, which the tool above can use.
I am also in favor of documentation staying with code. It makes easy to change the documentation whenever code/algorithm changes. As you mentioned we can parse the documentation from header files. Geo-replication Python code is yet to adopt new MSGID changes.



Gluster Tools infrastructure
============================
Are our Gluster log files sufficient for root causing the issues? Is
that error caused due to miss configuration? Geo-replication status is
showing faulty. Where to find the reason for Faulty?

Sac(surs AT redhat.com) mentioned that heis working on gdeploy and many
developers
are using their owntools. How about providing common infrastructure(say
gtool/glustertool) to host all these tools.

 From my toolkit, following tools are available, planning to create
more such tools for Geo-replication and Gluster.

     volinfo [<VOLNAME>] - Enhanced version of Gluster Volume info
     command (http://aravindavk.in/blog/glusterfs-tools/ )

     df - df for Gluster Volumes
(http://aravindavk.in/blog/glusterdf-df-for-gluster-volumes/ )

     georepsetup - A tool to Create Geo-replication session
easily(http://aravindavk.in/blog/introducing-georepsetup/ )

     gdash - A simple Dashboard for
     Gluster(http://aravindavk.in/blog/introducing-gdash/ )

     gfid <PATH>   - To get GFID of a file, works both in Mount and
Backend(https://github.com/aravindavk/gluster_georep_scripts )

     clparser <PATH> - Parse the backend Changelog and print in human
readable format(https://github.com/aravindavk/gluster_georep_scripts )

     xtime <PATH>  - To get XTIME xattr from given
path(https://github.com/aravindavk/gluster_georep_scripts )

     stime <PATH> - To get STIME xattr from given path(Used by
Geo-replication https://github.com/aravindavk/gluster_georep_scripts )

     volmark <VOLNAME> - To get Volume Mark of given Volume(Used by
Geo-replication https://github.com/aravindavk/gluster_georep_scripts )


Geo-replication developers are already using some tools like Changelog
parser, `arequal-checksum` etc.

Initial idea for Tools Framework:
---------------------------------
A Shell/Python script which looks for the tool in plugins sub directory, if
exists pass all the arguments and call that script.

`glustertool help` triggers a python Script plugins/help.py which reads
plugins.yml file to get the list of tools and help messages associated
with it.

No restrictions on the choice of programming language to create
tool. It can be bash, Python, Go, Rust, awk, sed etc.

Challenges:
- Each plugin may have different dependencies, installing all tools
may install all the dependencies.
- Multiple programming languages, may be difficult to maintain/build.
- Maintenance of Third party tools.
- Creating Plugins registry to discover tools created by other developers.

Tool Ideas:
-----------
If you are interested in working on tools for Gluster, I am listing a
few ideas to start with, feel free to add your ideas to the list.

- A tool to analyze the log file and identify issues. For example,
glustertool georep_log_analize <LOG FILE PATH> --after-date <TIMESTAMP>

   Example output: (Illustrative purpose only)

   Number of workers in this node: 2
   Number of restarts: 5
   Errors: 10
   Python Tracebacks: 5
   Last state: Active
   Files Skipped: 0
   Setup issue: No

- Extract skipped GFIDs from Geo-replication logs and re-trigger sync.

   For example,
   glustertool georep_extract_skipped <LOG_FILE> --after-date<TIMESTAMP>

   This command will
   1. extract Skipped GFIDs list,
   2. Mounts Master Volume
   3. converts GFID to Path
   4. Set Virtual xattr to re-trigger the sync

- A tool to detect Split brain

- A tool to convert GFID to Path

Created a etherpad to record the available tools and ideas
https://public.pad.fsfe.org/p/gluster-tools
Will update once the I make some progress in creating infrastructure.

Comments and Suggestions welcome.


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux