https://bugzilla.redhat.com/show_bug.cgi?id=2319926 Bug ID: 2319926 Summary: Review-request: python-html-text - Extract text from HTML Product: Fedora Version: rawhide OS: Linux Status: NEW Component: Package Review Severity: medium Assignee: nobody@xxxxxxxxxxxxxxxxx Reporter: benson_muite@xxxxxxxxxxxxx QA Contact: extras-qa@xxxxxxxxxxxxxxxxx CC: package-review@xxxxxxxxxxxxxxxxxxxxxxx Target Milestone: --- Classification: Fedora spec: https://download.copr.fedorainfracloud.org/results/fed500/gourmand/fedora-rawhide-x86_64/08156160-python-html-text/python-html-text.spec srpm: https://download.copr.fedorainfracloud.org/results/fed500/gourmand/fedora-rawhide-x86_64/08156160-python-html-text/python-html-text-0.6.2-1.fc42.src.rpm description: How is html_text different from .xpath('//text()') from LXML or .get_text() from Beautiful Soup? - Text extracted with html_text does not contain inline styles, javascript, comments and other text that is not normally visible to users; - html_text normalizes whitespace, but in a way smarter than .xpath('normalize-space()), adding spaces around inline elements (which are often used as block elements in html markup), and trying to avoid adding extra spaces for punctuation; - html-text can add newlines (e.g. after headers or paragraphs), so that the output text looks more like how it is rendered in browsers. fas: fed500 Comments: Pytest7 warning seems spurious as pytest7 is not installed. Reproducible: Always -- You are receiving this mail because: You are always notified about changes to this product and component You are on the CC list for the bug. https://bugzilla.redhat.com/show_bug.cgi?id=2319926 Report this comment as SPAM: https://bugzilla.redhat.com/enter_bug.cgi?product=Bugzilla&format=report-spam&short_desc=Report%20of%20Bug%202319926%23c0 -- _______________________________________________ package-review mailing list -- package-review@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to package-review-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/package-review@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue