Re: Re: mirror a html site

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/24/2017 01:49 PM, Miguel González wrote:
> On 12/24/17 12:53 AM, Good Guy wrote:
>> On 23/12/2017 10:26, Miguel González wrote:
>>>   A hosting company with their builder tool created a static html site
>>> that can´t be downloaded.
>>>
>> Did you try this tool?
>>
>> <https://www.httrack.com/>
>>
>> If not please provide a link of the site because there is no such thing
>> as "can´t be downloaded" when the site is visible to the public.
> What I mean is that the company doesn´t provide any FTP access to
> download the files.
> 
> I did use httrack and at least I could keep a backup of the website (not
> complete, because It wasn´t able to download links with spanish characters).
> 
> Unfortunately as I said, it creates folders for the cdn entries and the
> structure of the website is using www.mysite.com/www.mysite.com/
> structure with subfolders for each cdn.
> 
> For the time being I am using wget -mkEp which is still using the cdn
> entries from the company. It´s not the best solution but in case they
> turn of the cdns It will be much "easier" to change links manually.
> 
> thanks!


Scraping the website largely depends on the amount of javascript garbage
on the pages.  The straight html and source can be pulled by LWP and
w3m, fairly easily.

-- 
So many immigrant groups have swept through our town
that Brooklyn, like Atlantis, reaches mythological
proportions in the mind of the world - RI Safir 1998
http://www.mrbrklyn.com

DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002
http://www.nylxs.com - Leadership Development in Free Software
http://www2.mrbrklyn.com/resources - Unpublished Archive
http://www.coinhangout.com - coins!
http://www.brooklyn-living.com

Being so tracked is for FARM ANIMALS and and extermination camps,
but incompatible with living as a free human being. -RI Safir 2013

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx




[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux