Re: Migration of git-scm.com to a static web site: ready for review/testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Todd,

On Fri, 17 Nov 2023, Todd Zullinger wrote:

> Johannes Schindelin wrote:
> >> For checking links, a tool like linkcheker[1] is very handy.
> >> This is run against the local docs in the Fedora package
> >> builds to catch broken links.
> >
> > Hmm, `linkchecker` is really slow for me, even locally.
>
> Yeah, it took an hour and a half to run for me, both on an
> old laptop and a fast server with plenty of threads,
> bandwidth, and memory.
>
> Checking the git HTML documentation takes under 30 seconds,
> which is largely the only place I've used it.  It has been
> very helpful in catching broken links in the docs during the
> build and the time is short enough that I never minded.

I found https://lychee.cli.rs/#/ in the meantime and figured out how to
use it in a local setup:

First, I run:

	HUGO_TIMEOUT=777 HUGO_BASEURL= HUGO_UGLYURLS=false time hugo

The first `HUGO_*` setting is to make sure that even though I sometimes
use all of the cores of my laptop's CPU it should not fail. The other two
are to override settings from `hugo.yml` so that `lychee` can handle the
output (`lychee` will not auto-append `.html`, unlike GitHub Pages, and
would therefore mis-detect tons of broken links, without
`HUGO_UGLYURLS=false`).

In my setup, this command typically runs for something like half a minute,
but sometimes takes for as long as 1 minute. (I noticed that it is much
slower when I open the directory in VS Code because I'm running this in
WSL and the filesystem watcher kind of eats all resources.)

After that, I run:

	time lychee --offline --exclude-mail \
	        --exclude file:///path/to/repo.git/ \
		--exclude file:///caminho/para/o/reposit%C3%B3rio.git/ \
		--exclude file:///ruta/a/repositorio.git/ \
		--exclude file:///sl%C3%B3%C3%B0/til/hirsla.git/ \
		--exclude file:///Pfad/zum/Repo.git/ \
		--exclude file:///chemin/du/d%C3%A9p%C3%B4t.git/ \
		--exclude file:///srv/git/project.git \
		--exclude "file://$PWD/public/pagefind/pagefind-ui.css" \
		--format markdown -o lychee-local.md public/

Without `--offline`, there would be a couple of broken links (the
http://git.or.cz/gitwiki/InterfacesFrontendsAndTools link leads to
"Forbidden", it needs to be changed to https://).

The `file:///` URLs are all examples that are not expected to be valid.
And we do not want to check the emails (tons of `xyz@xxxxxxxxxxx` would be
"broken").

This command typically takes another half minute, sometimes a bit longer.

Given those times and the configurability (and the lure of a GitHub
Action that could be easily integrated into a GitHub workflow:
https://github.com/marketplace/actions/lychee-broken-link-checker), I have
up on linkchecker and focused exclusively on lychee.

Now, when I started working on this on Friday, lychee reported about
12,000 broken links.

There were a couple of legitimate mistakes I made (when feeding paths to
Hugo's `relURL` function, the path must not have a leading slash or it
will remain unchanged, for example). These are fixed.

But there were also many other issues such as some manual page translation
being incomplete yet linking to not-yet-existing pages. In those cases, I
changed he code to generate redirects to the English version. For example,
https://git.github.io/git-scm.com/docs/git-clone/fr#_git has a link to
`git[1]` that _should_ lead to the French version of the `git` manual
page. However, that does not exist. So both the Rails App as well as the
static website redirect to the English variant of that page.

My most recent lychee run results in 0 broken links.

As a bonus, some of the links that are currently broken on
https://git-scm.com/ are fixed in https://git.github.io/git-scm.com/.
For example, following the `Pull Request Referləri` link at the top of
https://git-scm.com/book/az/v2/Appendix-C:-Git-%C6%8Fmrl%C9%99ri-Plumbing-%C6%8Fmrl%C9%99ri/
leads to a 404. But following it in
https://git.github.io/git-scm.com/book/az/v2/Appendix-C:-Git-%C6%8Fmrl%C9%99ri-Plumbing-%C6%8Fmrl%C9%99ri/
directs the browser to the correct URL:
https://git.github.io/git-scm.com/book/az/v2/GitHub-Bir-Layih%C9%99nin-Saxlan%C4%B1lmas%C4%B1/#_pr_refs

Another thing that is broken on https://git-scm.com/ are the footnotes in
the Czech translation of the ProGit book. These were broken in the Hugo
version, too, but now they are fixed. See e.g.
https://dscho.github.io/git-scm.com/book/cs/v2/Z%C3%A1klady-pr%C3%A1ce-se-syst%C3%A9mem-Git-Zobrazen%C3%AD-historie-reviz%C3%AD/#_footnotedef_7
and note that the Rails App redirects to
https://git-scm.com/book/cs/v2/Z%C3%A1klady-pr%C3%A1ce-se-syst%C3%A9mem-Git-Zobrazen%C3%AD-historie-reviz%C3%AD/ch00/_footnotedef_7
when clicking on the `[7]`, which 404s.

Could you double-check that the links in the current version?

Thank you,
Johannes

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux