* Magnus Andersen <mag.andersen@xxxxxxxxx> [2005-12-06 20:37 +0100]: > I need to download files from a website and I'd like to automate it. I have > to login to the website, navigate to the download section and download the > files. They do not have an ftp site and I have to do this over http. The > system I'll be doing this from is a RHEL 3 As system. Definitely try "wget". wget copies files over http, https or ftp from the corresponding servers. Additionally it can * (for http(s):) follow the links contained in HTML-files and * (for ftp:) grab any subdirectories. If the server conveys filedates, wget will adopt them for the files it receives. This way it can avoid retransmission of files already available. The copy differs from the original at the following details: * Files that have been deleted from the server, stay alive in the copy. * Files not pointed at by a link are missing (for http). * No permissions (owner, group, authorization) are transferred. Usage: wget options URL Options of interest are: -N Do not download files that are already available locally and match the server's filedate. -nH --cut-dirs=2 In recursive mode, wget normally creates a subdirectory for the hostname and any directories mentioned in the URL. The option -nH suppresses the creation of hostdirs, and --cut-dirs=2 the creation of the first two directories. For example: wget -r -nH --cut-dirs=2 http://www.jfranken.de/homepages/johannes/vortraege will create the directory vortraege. -k turns absolute URLs to relative ones in HTML-files. Caution, this does not work in any situations. -r -np (recursive, no-parent): If the given URL provides a html-file, wget will also fetch any elements referenced (in particular links and graphics) and repeat this procedure for them. The option -np avoids ascending to the parent directory. wget ignores references to other hosts, except if you set the parameter -H. -p -l 10 The parameter -l 10 limits the recursion depth for -r to 10 levels. The default depth is 5. If you set -l 0, it downloads at infinitive depth, which can cause filesystem problems on cyclic links. -H -Djfranken.de,our-isp.org Also follow links to different servers, if they belong to the domain jfranken.de or our-isp.org. -nv Avoids output of debugging messages. wget wil direct its ftp- or http-requests automatically to your proxyserver, if the environment varibale http_proxy oder ftp_proxy are set, e.g. by $ export http_proxy=http://jfranken:secret@xxxxxxxxxxxxxxxxx:3128/ $ export ftp_proxy=$http_proxy Links: * wget project page: http://www.gnu.org/software/wget/wget.html * wget(1) manpage Good luck! -- Johannes Franken Professional unix/network development mailto:jfranken@xxxxxxxxxxx http://www.jfranken.de/ -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list