Hi listers: I am a college student, majoring in Computer Science. I am now involved in a research project dealing with the subject of cryptography. And a lot of good material on the net is available only in a certain file format, pdf. It doesn't convert well to text or html. Therefore, it is not necessarily useful. My question is, how do I turn PDF to html or text without loss of data? I will list some of the solutions I've tried. Firstly, there is pstotext. This works well for most things, but it is a loss for some of the files I am starting to encounter. Next, there is the pdftotext utility from the xpdf distribution. It works beautifully for most files. I found this, and I now use it instead of pstotext. Still, it is sometimes a loss when the file contains mathematical formulae and other symbols. There are the online conversion tools offered through access.adobe.com. And of course, if the file contains mathematical formulae, there may be lossage. There may even be lossage when the file contains English words. For example, one PDF file contained the word "modifications". The web-based tools produced "modi cations" as output. All of the above solutions are good ones, and they work most of the time. I'm certainly not putting any of these products down, by any stretch. PDF is a complex file format. Writing a translator for it is certainly no mean feat, I am sure. There is one more solution, but it is certainly less than optimal. I've also used optical character recognition. You can turn a pdf into a collection of .pnm bitmap image files using Ghostscript. Then run the OCRShop utility from Vividata on the collection. So those are the possibilities for converting pdf to text under Linux, as I see them. None is perfect, though some work well. What do I do? Help! BTW, I notice that html is getting better and better. The newest specs make it possible to represent all sorts of symbolry, mathematical and otherwise. One of our list members, Karl Dahlke, has proven the concept with his math site (http://www.mathreference.com). So if this is the case, we should hopefully be seeing more material in html which previously might have been in pdf, shouldn't we? But the reverse seems to be the case. I'm seeing a lot of pdf these days. Has it always been this way? Maybe I'm just running into more pdf because I'm researching cryptography. Will it change in the forseeable future? _______________________________________________ Blinux-list@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/blinux-list