On a Friday in 2021, Tim Wiederhake wrote:
This is a wrapper for codespell [1], a spell checker for source code. Codespell does not compare words to a dictionary, but rather works by checking words against a list of common typos, making it produce fewer false positives than other solutions. The script in this patch works around the lack of per-directory ignore lists and some oddities regarding capitalization in ignore lists. [1] (https://github.com/codespell-project/codespell/)
RFC: Is there interest in having something like this in CI?
Adding it as a job with 'allow_failure: true' would let us see how many false positives there are / how annoying it is.
Examples of spelling mistakes that were found using codespell: 4ad3c95f4bef5c7c9657de470fb74a4d14c8a331, 785a11cec8693de7df024aae68975dd1799b646a, 1452317b5c727eb17178942012f57f0c37631ae4.
Please drop the RFC part from the commit message.
Signed-off-by: Tim Wiederhake <twiederh@xxxxxxxxxx> --- scripts/check-spelling.py | 115 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100755 scripts/check-spelling.py diff --git a/scripts/check-spelling.py b/scripts/check-spelling.py new file mode 100755 index 0000000000..01371c0d1e --- /dev/null +++ b/scripts/check-spelling.py @@ -0,0 +1,115 @@ +#!/usr/bin/env python3 + +import argparse +import re +import subprocess +import os + + +IGNORE_LIST = [ + # ignore all translation files + ("/po/", []), + + # ignore this script + ("/scripts/check-spelling.py", []), + + # 3rd-party: keycodemapdb + ("/src/keycodemapdb/", []), + + # 3rd-party: VirtualBox SDK + ("/src/vbox/vbox_CAPI", [ + "aAdd", + "aCount", + "aLocation", + "aNumber", + "aParent", + "progess"]), + + # 3rd-party: qemu + ("/tests/qemucapabilitiesdata/caps_", "encyption"),
You can completely skip checking the files we got from the 3rd party. I'm also getting: ("/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.replies", "hace"), # line 17966, "have"? ("/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.replies", "hace"), # line 18659, "have"? ("/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.replies", "hace"), # line 20619, "have"? ("/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.replies", "hace"), # line 20871, "have"?
+
[..]
+def main(): + parser = argparse.ArgumentParser(description="Check spelling") + parser.add_argument( + "dir", + help="Path to source directory", + type=os.path.realpath) + args = parser.parse_args() + + findings = [f for f in check_spelling(args.dir) if not ignore(*f)] + if findings: + template = "(\"{0}\", \"{2}\"),\t# line {1}, \"{3}\"?" + for finding in findings: + print(template.format(*finding)) + exit("error: %s spelling errors" % len(findings)) + + +if __name__ == "__main__": + main()
I'm also getting: ("/src/qemu/qemu_process.c", "wee"), # line 1225, "we"? ("/src/qemu/qemu_process.c", "wee"), # line 2369, "we"? ("/.git/logs/HEAD", "capablities"), # line 459, "capabilities"? .git should be ignored completely too. Reviewed-by: Ján Tomko <jtomko@xxxxxxxxxx> Jano
Attachment:
signature.asc
Description: PGP signature