I've spent a number of hours trying all kinds of things I've
found on web, but not getting anywhere. Probable
something simple.
Download 64 web pages into a single file using wget2.
That is fine.
file allraw.uog
allraw.uog: HTML document, UTF-8 Unicode text, with
very long lines
File is about 13M (have no control of the source file)
Have a simple CPP program that files lines that have
special utf-8 characters. Would extract that code and
printed output to screen directly and shows correct utf
characters. But If I redirect file to file name and open it,
many of the utf-8 characters show up as wrong extended
ascii character for first byte and then weird code? Both in
gedit and geany??
Modified program to write output directly to a file and if I
use cat the output displays the correct utf-8, but again if I
open file in gedit or geany it shows a a corrupted mix of
extended ascii??
$ ./findnoascii2 allraw.uog
Think this is the issue, but no ideal how to fix it.
$ file allraw.uog.out
allraw.uog.out: Non-ISO extended-ASCII text
The file actually contains the correct utf-8 data, and
looking at it with hexedit shows it, but both geany and
gedit open the file as extended ASCII insteat of UTF-8.
Changing the encoding afterward to UTF-8 does nothing.
Don't se options? Again, probable something simple..
Thanks.
Using cat to display out is fine.
Line number position in line hexcode of first chacter then
character and a file more characters.
1881 110 c2bb » <s
1881 196 c2bb »
2266 285 c2a0 L. <span
2266 879 e2809c “Communi
2266 954 e2809d ” of the
3090 556 e280ba ›</a></l
3090 655 c2bb »</a></li
3134 46 c8a7 ȧt</span>
3134 83 c3a5 åhan</spa
3245 150 c2a9 ©</a>
Same lines from geany?
1881 110 c2bb » <s
1881 196 c2bb »
2266 285 c2a0 Â L. <span
2266 879 e2809c â??Communi
2266 954 e2809d â? of the
3090 556 e280ba â?º</a></l
3090 655 c2bb »</a></li
3134 46 c8a7 ȧt</span>
3134 83 c3a5 åhan</spa
3245 150 c2a9 ©</a>
Thanks...
_______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure