Conflict python-html2text/html2text (both /usr/bin/html2text)?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



All,

I have been working on a project that retrieves and parses statutes basically making use of the piped processes 'curl -s url | html2text -utf8' with the read-end of the second pipe passed to getline. I moved the code from my laptop (suse) to my servers (Arch) and the pipe process broke. Poking around, I found it was due to Arch packaging python-html2text (as /usr/bin/html2text) instead of the gcc-libs version (see: https://aur.archlinux.org/packages/html2text-with-utf8).

While not updated recently, the gcc-libs version is quite a bit more robust and flexible (not to mention it actually provides a man page). See:

http://www.mbayer.de/html2text/

There are format shortcomings with the python version as well. One big one being you cannot control the word wrap (--body-width) and prevent double line-breaks after block elements. e.g. (--single-line-break requires --body-width=0).

Is there any reason in particular Arch is packaging the python version instead? If nothing else, is there any interest in at least renaming the resulting executable to prevent direct conflict with the gcc-libs version. ('pyhtml2text' makes sense)

  As an example, compare the output of:

(gcc-libs version)

$ curl -s http://www.statutes.legis.state.tx.us/Docs/TN/htm/TN.1.htm | html2text -utf8

(python version)

$ curl -s http://www.statutes.legis.state.tx.us/Docs/TN/htm/TN.1.htm | html2text

  This conflict can easily avoided in the python version with the rename:

$ diff -uNb --label PKGBUILD PKGBUILD.orig PKGBUILD
--- PKGBUILD
+++ PKGBUILD    2015-03-10 09:25:43.906168003 -0500
@@ -11,8 +11,8 @@
 url="https://pypi.python.org/pypi/html2text/";
 license=('GPL3')
 depends=('python-setuptools')
-provides=('html2text')
-replaces=('html2text')
+provides=('pyhtml2text')
+replaces=('pyhtml2text')

source=(https://pypi.python.org/packages/source/h/html2text/html2text-$pkgver.tar.gz)
 sha256sums=('c3977dfe6fd1ba0d4091f85963306488b3e9e236cfe60d8821158ce5a7fcb619')

@@ -29,5 +29,6 @@
 package() {
   cd "${srcdir}"/html2text-${pkgver}
   python setup.py install --root="${pkgdir}"
+  mv "${pkgdir}"/usr/bin/html2text "${pkgdir}"/usr/bin/pyhtml2text
 }

The install script could even check for the presence of /usr/bin/html2text, and if absent, provide a soft-link.

I just found it very odd to find an executable in Arch named 'html2text' that works completely different from the traditional 'html2text' found in many distributions for years.

I don't know if there is any interest among the devs to rectify this. If there is I'm happy to file a feature request, etc. Let me know. Thanks.

--
David C. Rankin, J.D.,P.E.


[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux