Hi Johannes, Johannes Schindelin wrote: >> For checking links, a tool like linkcheker[1] is very handy. >> This is run against the local docs in the Fedora package >> builds to catch broken links. > > Hmm, `linkchecker` is really slow for me, even locally. Yeah, it took an hour and a half to run for me, both on an old laptop and a fast server with plenty of threads, bandwidth, and memory. Checking the git HTML documentation takes under 30 seconds, which is largely the only place I've used it. It has been very helpful in catching broken links in the docs during the build and the time is short enough that I never minded. > Granted, the added cross-references now increase the number of hyperlinks > to check, but after I let the program run for a bit over an hour to look > at https://git-scm.com/ (for comparison), it is now running on the local > build (i.e. the `public/` folder generated by Hugo, not even an HTTP > server) for over 45 minutes and still not done: > > -- snip -- > [...] > 10 threads active, 112977 links queued, 206443 links in 100001 URLs checked, runtime 48 minutes, 46 seconds > 10 threads active, 113455 links queued, 206689 links in 100001 URLs checked, runtime 48 minutes, 52 seconds > 10 threads active, 113829 links queued, 206874 links in 100001 URLs checked, runtime 48 minutes, 57 seconds > 10 threads active, 114230 links queued, 207136 links in 100001 URLs checked, runtime 49 minutes, 3 seconds > 10 threads active, 114731 links queued, 207498 links in 100001 URLs checked, runtime 49 minutes, 9 seconds > -- snap -- I would have thought that bumping the number of threads a lot would really help, but I ran it on a dual Xeon system with 40 threads and it took about the same time. Perhaps I should have increased to double or more the system processor count. > Maybe something is going utterly wrong because the number > of links seems to be dramatically larger than what the > https://git-scm.com/ reported; Maybe linkchecker broke out > of the `public/` directory and now indexes my entire > harddrive ;-) Heh, hopefully not. :) I wondered if there were circular links that it was picking up and not de-duplicating. I may try to run it with the --verbose option which logs all checked URLs. Maybe that will turn up something. It sure seems like there's a _lot_ of links here. There is a --recursion-level option which might be helpful. The --ignore-url and/or --no-follow-url may also be useful. Though even if it's (very) slow, it might be worth running to flush out some initial issues before making the site live. Letting it run in the background for a few hours is probably less effort than fielding a number of big reports about broken URL here and there. :) Of course, it would be even better if it were fast enough to run as part of the site build process to catch broken links before each deployment, but that would need to be measured in some relatively small number of seconds instead of the hours it seems to take now. :/ -- Todd