Re: waay off topic... help if you wish, toss if you don't.. if more than a few object.. would the moderator kill the thread please!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/26/18 9:25 AM, Tim via users wrote:
> Allegedly, on or about 26 October 2018, Eddie O'Connor sent:
>> "Crawling"....is that the same as "Parsing"?
> 
> Going from page to page, or site to site, parsing the contents. 
> Whether that be following links from one page to another, using the
> links on those pages, or following links from some other list.
> 
> As opposed to just parsing the contents of one particular page.

Other terms for what Bruce is referring to doing are "scraping" or
"spidering". One uses a tool such as wget or something similar to walk
down a website (or several sites), collect the data and "scrape"
interesting tidbits into a database for use in some way. In some
respects, this is what Google or Bing or Yahoo or (this'll date me)
Alta Vista does to drive their search engines (I think Alta Vista is
long gone--it was owned by DEC).

A huge part of this is the pattern recognition bit and often employs
different types of AI to extract the information one is interested in.
It's quite an involved process and very impressive if it's done right.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    ricks@xxxxxxxxxxxxxx -
- AIM/Skype: therps2        ICQ: 226437340           Yahoo: origrps2 -
-                                                                    -
-  Perseverance:  When you're too damned stubborn to say "I quit!"   -
----------------------------------------------------------------------
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux