https://bugzilla.redhat.com/show_bug.cgi?id=1002704 Bug ID: 1002704 Summary: Review Request: boilerpipe - Boilerplate Removal and Fulltext Extraction from HTML pages Product: Fedora Version: rawhide Component: Package Review Severity: medium Priority: medium Assignee: nobody@xxxxxxxxxxxxxxxxx Reporter: puntogil@xxxxxxxxx QA Contact: extras-qa@xxxxxxxxxxxxxxxxx CC: notting@xxxxxxxxxx, package-review@xxxxxxxxxxxxxxxxxxxxxxx Spec URL: http://gil.fedorapeople.org/boilerpipe.spec SRPM URL: http://gil.fedorapeople.org/boilerpipe-1.2.0-1.fc19.src.rpm Description: The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings. Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate. Fedora Account System Username: gil -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=si01AujCgH&a=cc_unsubscribe _______________________________________________ package-review mailing list package-review@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/package-review