Attached is an example scan from XSane. There are more than just a few
grey levels and you see the hand written notes on top of it This file
could easily be represented in a posterized form with 4 Bits per pixel,
probably it could also be represented by 2 Bits per pixel, but Gimp's
function posterize should be given parameters to use something blue for
the handwritten part like in my second example .
Gimp does not show a histogram when using the contrast curve tool in my
example.
My question arises when lots of already scanned and OCR-treated pdf
files shall be optimized with three goals:
1. reduce the size of the file by reducing the 600 dpi resolution which
was chosen during scanning for better OCR results, keep the OCR result,
2. reduce the bits per pixel for the scan image plane, e.g. by
posterizing or even binaizing,
3. improve the contrast of the displayed pdf file by some contrast
enhancing function, e.g. as it is done after applying a contrast curve
in Gimp.
I want to do all this maintaining the OCR-plane from input files.
Manipulating sandwich PDF-files (like scans made searchable by OCR) is
probably out of the scope of Gimp. But the functions used for the image
plane are in it.
gs (ghostscript) can reduce the dpi e.g. from 600 dpi (good for OCR)
down to 150 dpi (insufficient for OCR but sufficient to display most
documents. I wish, they would also provide 200 dpi requiring a bit more
storage space. Unfortunately gs only handles 72 dpi (/screen), 150dpi
(/ebook) and 300 dpi for output. It can do this keeping the OCR pane.
To my knowledge, gs can't apply any color or grey level transformations,
even none which could be made by a look up table.
Regards
Adalbert
Am 18.09.22 um 22:32 schrieb Liam R E Quin:
On Sun, 2022-09-18 at 20:52 +0200, Adalbert Hanßen via gimp-developer-
list wrote:
XSane produced a color scan from a document with 600 dpi and fill
color,
1.1MB file size.
I normally have XSane make a png file. For 8-bit per channel (0 to 255)
images, you can also use the XSane gimp plugin, which is a lot easier,
but make sure to export the file right away so you havwe a copy if gimp
crashes or if you make a mistake :)
When I load this file into Gimp, I get an error message about an
incompatible TIFF format (additiona channels without the field
ExtraSamples). It gives me choice to let the additional channel worlk
as
* non pre-multiplied alpha
* pre-multiplied alpha
* channel
I see no difference whatever choice I select.
If you choose Channel, it'll be visible in Gimp's Channels dialogue.
Otherwise, it's most like pre-multiplied alpha (transparency), and will
most likely be "all opaque", so you can ignore it.
However: When I try to adapt colors by the contrast curve, I see no
Histogram under it.
How large is the image? If you used the Line Art setting in XSane every
pixel will be either 0 or 255, so the histogram is just two vertical
lines, one at eacn end, that aren't really visible as they're right
next to the edge. FOr a large image it can take a while for the
background thread to count all the pixels in the image and fill in the
histogram.
** Is this due to the error message when loading the file? **
no.
ankh / liam / demib0y
_______________________________________________
gimp-developer-list mailing list
List address: gimp-developer-list@xxxxxxxxx
List membership: https://mail.gnome.org/mailman/listinfo/gimp-developer-list
List archives: https://mail.gnome.org/archives/gimp-developer-list