[Bug 1002704] New: Review Request: boilerpipe - Boilerplate Removal and Fulltext Extraction from HTML pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.redhat.com/show_bug.cgi?id=1002704

            Bug ID: 1002704
           Summary: Review Request: boilerpipe - Boilerplate Removal and
                    Fulltext Extraction from HTML pages
           Product: Fedora
           Version: rawhide
         Component: Package Review
          Severity: medium
          Priority: medium
          Assignee: nobody@xxxxxxxxxxxxxxxxx
          Reporter: puntogil@xxxxxxxxx
        QA Contact: extras-qa@xxxxxxxxxxxxxxxxx
                CC: notting@xxxxxxxxxx,
                    package-review@xxxxxxxxxxxxxxxxxxxxxxx



Spec URL: http://gil.fedorapeople.org/boilerpipe.spec
SRPM URL: http://gil.fedorapeople.org/boilerpipe-1.2.0-1.fc19.src.rpm
Description:
The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates)
around the main textual content of a web page.

The library already provides specific strategies 
for common tasks (for example: news article extraction) and
may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the
input document (no global or site-level information required) and
is usually quite accurate. 
Fedora Account System Username: gil

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=si01AujCgH&a=cc_unsubscribe
_______________________________________________
package-review mailing list
package-review@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/package-review





[Index of Archives]     [Fedora Legacy]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]