On Monday 20 January 2020 12:32:15 Theodore Y. Ts'o wrote: > On Mon, Jan 20, 2020 at 01:04:42PM +0900, OGAWA Hirofumi wrote: > > > > To be perfect, the table would have to emulate what Windows use. It can > > be unicode standard, or something other. And other fs can use different > > what Windows use. > > The big question is *which* version of Windows. vfat has been in use > for over two decades, and vfat predates Window starting to use Unicode > in 2001. Before that, vfat would have been using whatever code page > its local Windows installation was set to sue; and I'm not sure if > there was space in the FAT headers to indicate the codepage in use. VFAT is extension to FAT which stores file names in UTF-16. In original FAT without VFAT extension (in all variants, FAT12, FAT16 and FAT32) is file name stored "according to current 8bit OEM code page". VFAT-aware FAT implementation would know if particular filename is really VFAT (UTF-16) or without VFAT (8bit OEM code page). There are flags in FAT which indicates if entry is VFAT (UTF-16). And no, there are no bits in FAT header which specify OEM code page. So if you use "mode con" or "chcp" (or what was those MS-DOS commands for changing OEM codepage), all non-VFAT filenames would change after next reading of FAT directory. But because every OEM code page is full 8bit, you always get valid data. Just you would see that your file name is different :D > It would be entertaining for someone with ancient versions of Windows > 9x to create some floppy images using codepage 437 and 450, and then > see what a modern Windows system does with those VFAT images --- would Hehe :-) I did it as part of my investigation, how is stored FAT volume label and how different tools read it. FAT label is *not* stored as UTF-16 but only in that OEM code page like old filenames on MS-DOS https://www.spinics.net/lists/kernel/msg2640891.html And what recent Windows do? They decode such filenames (and therefore also volume label) via OEM codepage which belongs to current system Language settings. You cannot change OEM codepage on recent Windows. You can only change Regional Language (which then change OEM codepage which belongs to it). Mapping table between Windows Regional Language and OEM codepage is in (still unreleased) fatlabel(8) manpage, section DOS CODEPAGES, here: https://github.com/dosfstools/dosfstools/blob/master/manpages/fatlabel.8.in > it break horibbly when it tries to interpret them as UTF-16? Or would As Windows knows that filename is stored as 8bit and not UTF-16, nothing is broken. Just for characters with upper bit set you probably does not see filenames as you saw in MS-DOS. But if you remember which OEM code page you used on MS-DOS, you can change Windows Language to one which uses your OEM code page and then you can read that old FAT fs without any broken file names. > it figure it out? And if so, how? Inquiring minds want to know.... > > Bonus points if the lack of forwards compatibility causes older > versions of Windows to Blue Screen. :-) I have not got any Blue Screens during reading of these older FAT fs created and used by MS-DOS. On Linux it is easier, just specify -o codepage= mount option and vfat.ko translate it correctly. > > - Ted > > P.S. And of course, then there's the question of how does older > versions of Windows handle versions of Unicode which postdate the > release date of that particular version of Windows? After all, This is not a problem. Windows allows you to store into filename arbitrary sequence of uint16[] (except disallowed MS-DOS chars like :?<>...). And when doing read directory operation you need to expect that it will returns arbitrary sequence of uint16[]. Windows does not care about valid/invalid/assigned/unassigned code points. It even do not care about halves of surrogate pairs. So it can store also one half of (unpaired) surrogate pair (one uint16). > Unicode adds new code points with potential revisions to the case > folding table every 6-12 months. (The most recent version of Unicode > was released in in April 2019 to accomodate the new Japanese kanji > character "Rei" for the current era name with the elevation of the new > current reigning emperor of Japan.) -- Pali Rohár pali.rohar@xxxxxxxxx
Attachment:
signature.asc
Description: PGP signature