Re: fc-glyphname bug

James Cloos <cloos@xxxxxxxxxxx> · Thu, 08 Mar 2007 05:15:36 -0500

>>>>> "Keith" == Keith Packard <keithp@xxxxxxxxxx> writes:

>> With only the dingbat names added in fcglyphlist,
>> FC_GLYPHNAME_MAXLEN is 4, and passing 5 to FT_Get_Glyph_Name()
>> causes a loop.

Keith> So FT_Get_Glyph_Name loops? Or we continue to call it with the
Keith> same data even though it returns an error?

I was so exhausted this afternoon, that after a good day's sleep
everything has run together and is a bit foggy.  But after reviewing
the long post (crossed to freetype-devel), fc calls FT_Get_Glyph_Name
in a loop with glyph_index (the 2nd arg) having values of:

0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, ...

With the buffer long enough to load all of the glyph names it instead
increments from 0 to face->num_glyphs as expected.

This happens in FcFreeTypeCharSetAndSpacing().

When FT_Get_Glyph_Name fails it returns (FT_Error) 6.

I think I might see the problem.

ft_mem_strcpyn() looks like this:

,----(freetype2/src/base/ftutil.c)
| FT_BASE_DEF( FT_Int )
|   ft_mem_strcpyn( char*        dst,
|                   const char*  src,
|                   FT_ULong     size )
|   {
|     while ( size > 1 && *src != 0 )
|       *dst++ = *src++;
| 
|     *dst = 0;  /* always zero-terminate */
| 
|     return *src != 0;
|   }
`----

Unless I'm missing something, the copy is not limited to size octets,
yes?  That probably writes over the value in FcFreeTypeCharSetAndSpacing()'s
glyph variable, if name_buf[] happens to be allocated just before it.

I bet that is what causes the loop.

Ensuring that the buffer is longer than the glyph names ensures that
ft_mem_strcpyn() cannot overstep and therefore avoids the loop.

I'll post that bit to freetype-dev.

>> The largest legal value [FC_GLYPHNAME_MAXLEN] could need in a
>> PostScript font is 127, so name_buf[128] should be a sufficient
>> initialization.

Keith> And this seems like a fine work-around; as you can see, these
Keith> buffers are just allocated on the stack. Pushing the actual bug
Keith> upstream to FreeType should fix the root cause eventually.

In that case, I presume something like:

diff --git a/fc-glyphname/fc-glyphname.c b/fc-glyphname/fc-glyphname.c
index a0e18e7..c2db931 100644
--- a/fc-glyphname/fc-glyphname.c
+++ b/fc-glyphname/fc-glyphname.c
@@ -282,7 +282,7 @@ main (int argc, char **argv)
              
     printf ("#define FC_GLYPHNAME_HASH %u\n", hash);
     printf ("#define FC_GLYPHNAME_REHASH %u\n", rehash);
-    printf ("#define FC_GLYPHNAME_MAXLEN %d\n\n", max_name_len);
+    printf ("#define FC_GLYPHNAME_MAXLEN 127\n\n");
     
     /*
      * Dump out entries

as well as the version you posted to keep i+=r from being constant in
fc-glyphname.c:insert() would be enough to avoid the bug until ft is
fixed to honour the size arg to ft_mem_strcpyn()?

Keith> Does Standard Symbol L also provide a regular encoding for the
Keith> glyphs that it uses? The list you provide looks a lot like the
Keith> standard Adobe encoding for text fonts. With your buffer size
Keith> fix in place, does this font start working?

It isn't the text encoding.  Symbol encoding gets its own table in the
PLRM (even back to the original Red Book).  URW's version just adds
Euro encoded at 0x80.

I currently have fc installed with the two patches I posted, so it has
the full glyphlist table.  With that version, xfd(1x) shows the glyphs
in their unicode codepoints for Standard Symbol L, just like it does
for ITC Zapf Dingbats.  The same holds for Symbol:foundry=adobe.

That does seem like the right thing to do, yes?  

>> Is the answer to add just those 189 glyph names rather than all of
>> the names in glyphlist.txt?

Keith> Certainly using a small subset of the glyph names would be
Keith> preferred to including all of them in the current data
Keith> structure form.

The list I posted last is exactly the glyph names needed, using the
code points in the glyphlist.txt file in fc-glyphname.  But I think
some of those are outdated.  The glyphs put in 0xF8XX:

radicalex;F8E5        arrowvertex;F8E6      arrowhorizex;F8E7
registersans;F8E8     copyrightsans;F8E9    trademarksans;F8EA
parenlefttp;F8EB      parenleftex;F8EC      parenleftbt;F8ED
bracketlefttp;F8EE    bracketleftex;F8EF    bracketleftbt;F8F0
bracelefttp;F8F1      braceleftmid;F8F2     braceleftbt;F8F3
braceex;F8F4          integralex;F8F5       parenrighttp;F8F6
parenrightex;F8F7     parenrightbt;F8F8     bracketrighttp;F8F9
bracketrightex;F8FA   bracketrightbt;F8FB   bracerighttp;F8FC
bracerightmid;F8FD    bracerightbt;F8FE

now have codepoints in 10646.  

At least these changes are needed:

                      arrowvertex;23D0      arrowhorizex;23AF

parenlefttp;239B      parenleftex;239C      parenleftbt;239D
bracketlefttp;23A1    bracketleftex;23A2    bracketleftbt;23A3
bracelefttp;23A7      braceleftmid;23A8     braceleftbt;23A9
braceex;23AA          integralex;23AE       parenrighttp;239E
parenrightex;239F     parenrightbt;23A0     bracketrighttp;23A4
bracketrightex;23A5   bracketrightbt;23A6   bracerighttp;23AB
bracerightmid;23AC    bracerightbt;23AD

My understanding is that Adobe put apple, radicalext and
the .serif versions (rather than the sans) of the copyright,
trademark and registered glyphs in the PUA in SymbolStd.otf.
Also arrowvertex, even though that is what U+23D0 is defined
to be.

With these entries added to fc-glyphname's loaded table the type1
versions of the fonts should show up just like the otf does.

Keith> I would not be averse to including all of them
Keith> if we built a data structure that did not use relocations
Keith> though. fontconfig has several large tables which have been
Keith> carefully designed to eliminate relocations; another one would
Keith> not be a terrible plan.

That is one programming exercise I've not tried.  

>> Adobe Symbol

Keith> Does fontconfig not currently correctly construct the set of
Keith> Unicode code points supported by this font?

I don't know.  I only have the one box to test on, and I'd like to
avoid backtracking to find out.  It should be exactly the same as
for Standard Symbol L, though, so what do you get from:

xfd -fa 'Standard Symbol L'

More than 38 glyphs on the first page?  Are the greek letters in
the 0300 page?  If yes+no, then it does not get the code points
correct w/o additions to fcglyphname.h.

-JimC
_______________________________________________
Fontconfig mailing list
Fontconfig@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/fontconfig