On Tue, Dec 16, 2008 at 05:40:36PM -0500, Matthias Clasen wrote: > > Unicode is character encoding > > HTML tags or similar are semantic markup > > Thanks Alan, I know that quite well. > > > Trying to extrapolate semantic markup from random ascii symbols is not > > a reliable or robust path, particularly when you come to internationalise > > things. > > One hopes the ascii symbols in most package descriptions are not > entirely random... and extrapolating something from them can be quite There is no reason to assume * for example is a bullet point, it could be a footnote indicator, maths or ascii art. The Unicode bullet on the other hand is uneqivocably a bullet point. So extracting from UTF-8 is safer, but extracting at all is dangerous > The specification for RPM doesn't imply anything about the description > field. And this thread is about how to possibly improve the situation by > agreeing on some form of interpretation. Right - the field is plain UTF-8 textual data and has been for years. You want to add a semantic version of it. That is fine but use a new header for the field the way RPM intends things to be added. -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list