Samuel Sieb writes: > On 2021-07-03 8:02 p.m., dwoody5654@xxxxxxxxx wrote: > > the url I am trying to download does not have an extension ie. no > > '.htm' such as: > > https://my.acbl.org/club-results/details/338288 The extension doesn't matter to any of the utilities mentioned as far as I know. I'm pretty sure they get the MIME type from the HTTP Content-Type header. > > wget does not download the correct web page. > > I tried it and it worked, sort of. The problem is that you want to > download everything to view it offline, but the site my.acbl.org has a > robots.txt that says "no robots allowed". So wget respects that and > will not download any required files from that site other than the > initial page. curl probably has the same issue. 1. The page does not have content represented in HTML AFAICT: it's a blob which is parsed and formatted by a battery of (java)scripts, some of which are resources on the Internet, and some are inline. In other words, the HTML in that file is used as a container format to transport the scripts to the browser. Neither wget nor curl support Javascript at all as far as I know. 2. 96% of the page is in two blobs; AFAICT there were no IMG or other elements that specify requirements by URL. If so, that would explain why only the top page was downloaded. 3. curl does not document how it handles robots.txt. Since as far as I can tell curl has no recursive or get-requirements option, it probably doesn't handle it at all. wget documents that wget -r (recursive downloads) respects robots.txt. It does not document that wget -p (get page requisites, too) respects robots.txt, but a quick test suggests that it does. I think this is a bug: any interactive program that supports non-text media will download required resources with the access to the HTML file. (If someone agrees and wants to do something about it, this is a wget bug, not a Fedora bug.) I don't have an alternative fetch tool to suggest, unfortunately. I think that you need to use a graphical browser somehow, or write a script in your favorite P-language. Steve _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure