Hi Hin-Tak, On Apr 5, 2014, at 2:11 AM, Hin-Tak Leung wrote: > ------------------------------ > On Fri, Apr 4, 2014 10:24 PM BST Anton Altaparmakov wrote: > >> Hi, >> >> On 4 Apr 2014, at 20:46, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: >>> From: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> >>> >>> The HFS Plus Volume Format specification (TN1150) states that >>> file names are stored internally as a maximum of 255 unicode >>> characters, as defined by The Unicode Standard, Version 2.0 >>> [Unicode, Inc. ISBN 0-201-48345-9]. File names are converted by >>> the NLS system on Linux before presented to the user. >>> >>> Though it is rare, the worst-case is 255 CJK characters converting >>> to UTF-8 with 1 unicode character to 3 bytes. Surrogate pairs are >>> no worse. The receiver buffer needs to be 255 x 3 bytes, >>> not 255 bytes as the code has always been. >> >> You are correct that that buffer is too small. However: >> >> 1) The correct size for the buffer is NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1 and not using a magic constant "3" (which is actually not big enough in case the string is storing UTF-16 rather than UCS-2 Unicode which I have observed happen on NTFS written to by asian versions of Windows but I see no reason why it could not happen on OS X, too, especially on a HFS+ volume that has been written to by a Windows HFS+ driver - even if native OS X driver would not normally do it - I have not looked at it I admit). That reliable source of information Wikipedia suggests Mac OS X also uses UTF-16 as of OS X 10.3 at least in userspace so chances are it either also uses it in the kernel or if not yet it might well do in future: >> >> http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings >> >> 2) You are now allocating a huge buffer on the stack. This is not a good thing to do in the kernel (think 4k stack kernel config - that single variable is consuming about a quarter of available stack). You need to allocate the buffer dynamically. As you only need to do the allocation on entry to hfsplus_readdir() and deallocate it on exit it is not a problem as it could be if you had to allocate/free for every filename. >> > > Hi Anton, > > Thanks for the comments. NLS_MAX_CHARSET_SIZE is 6 include/linux/nls.h but I think it is too generous in this case. It is correct that a unicode character needs at worst 6 bytes to code, but those in the upper range of that when encoded in UTF-16 would require a surrogate pair - i.e. it goes from *two* UTF-16 units to 6 bytes. So that's still x3, not x6. Also Unicode 2.0 covers only the first supplementary plane, and only requires up to 4 bytes. So that's what my "Surrogate pairs are no worse." part of the message was about. Please correct me if this reasoning is wrong. > > I'll switch to dynamic allocation and prepare a revised patch, after further discussion on the x3 vs x6 issue. > Thank you for the patch. I have some additional remarks. (1) At first I think that it makes sense to describe the reproducing path and resulting crash (if it is possible, of course). Such description can be useful for distinguishing user's issues. (2) I think that magic "3" is not good way too. :) Named constant is much better. (3) Now buffer will be big in size. So, to allocate it on the stack is dangerous way. I agree with it. It needs to use dynamic allocation (kmalloc/kfree) or look-aside cache (kmem_cache_alloc/kmem_cache_free). (4) Unfortunately, we have the same issue and for extended attributes case (HFSPLUS_ATTR_MAX_STRLEN). It needs to fix it too. Could you fix it? Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html