On 12/24/2017 01:49 PM, Miguel González wrote: > On 12/24/17 12:53 AM, Good Guy wrote: >> On 23/12/2017 10:26, Miguel González wrote: >>> A hosting company with their builder tool created a static html site >>> that can´t be downloaded. >>> >> Did you try this tool? >> >> <https://www.httrack.com/> >> >> If not please provide a link of the site because there is no such thing >> as "can´t be downloaded" when the site is visible to the public. > What I mean is that the company doesn´t provide any FTP access to > download the files. > > I did use httrack and at least I could keep a backup of the website (not > complete, because It wasn´t able to download links with spanish characters). > > Unfortunately as I said, it creates folders for the cdn entries and the > structure of the website is using www.mysite.com/www.mysite.com/ > structure with subfolders for each cdn. > > For the time being I am using wget -mkEp which is still using the cdn > entries from the company. It´s not the best solution but in case they > turn of the cdns It will be much "easier" to change links manually. > > thanks! Scraping the website largely depends on the amount of javascript garbage on the pages. The straight html and source can be pulled by LWP and w3m, fairly easily. -- So many immigrant groups have swept through our town that Brooklyn, like Atlantis, reaches mythological proportions in the mind of the world - RI Safir 1998 http://www.mrbrklyn.com DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002 http://www.nylxs.com - Leadership Development in Free Software http://www2.mrbrklyn.com/resources - Unpublished Archive http://www.coinhangout.com - coins! http://www.brooklyn-living.com Being so tracked is for FARM ANIMALS and and extermination camps, but incompatible with living as a free human being. -RI Safir 2013 --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx