On Fri, Sep 26, 2014 at 04:50:39PM +0200, Olaf Weber wrote: > I'm not sure how common the parsing code can be if needs to be capable of > retrieving data from a filesystem. > > Note given your and Andi Kleen's feedback on the trie size I've switched to > doing algorithmic decomposition for Hangul. This reduces the size of the > trie to 89952 bytes. > > In addition, if you store the trie in the filesystem, then the only part > that needs storing is the version for that particular filesystem, e.g no > compatibility info for different unicode versions would be required. This > would reduce the trie size to about 50kB for case-sensitive filesystems, and > about 55kB on case-folding filesystems. Honestly I wouldn't worry about demand loading it too much. This is a fairly special case code for NAS servers, and should not affect normal uses now that we use symbol_get. Let's get back to the fundamentals. > >It's a chicken and egg situation. I'd much prefer we enforce clean > >utf8 from the start, because if we don't we'll never be able to do > >that. And other filesystems (e.g. ZFS) allow you to do reject > >anything that is not clean utf8.... > > As I understand it, this is optional in ZFS. I wonder what people's > experiences are with this. It is as optional as your utf8 support for XFS is. But they do enforce valid utf8 if they use utf8 normalization for file name comparisms, be that case sensitive or insensitive. Take a look at the zfs(8) man page. > - Forbid non-UTF-8 filenames > - Allow non-UTF-8 filenames > - Make it a mount option > - Make it a mkfs option My take on this is: - I think we'll have to prevent non-utf8 file names for any cases where we use utf8 normalization. If you do not use utf8 normalization it's plain old Unix everything is allowed. - I think utf8 normalization vs not should be mkfs option, to make sure everyone including kernel and repair knows what sort of filesystem deal with. - case insensitive matching for utf8 normalized filesystems should be a runtime decision. mount time for now, but Samba people would be extremly happy to allow per-operation or per-process CI matching. But that is another totally different discusion I'd like to keep separate, I just want to make sure the disk format allows for it for now. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs