Re: Accented characters not working with CIFS (but ok with smbclient)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Probably not easy to solve I think.

cifs client needs to  encode/decode correct bytestream / bytesequence on the
wire by taking into consideration the charset used by cifs client even when
unicode is not negotiated.
If iocharset is specified, it should honour that mount option for non-unicode
case also.

Basically, decode/encode of received/sent bytestream / bytesequence on the
wire should happen in both the cases - unicode and non-unicode - based on
iocharset if specified as a mount option or  default charset when iocharset
is not specified as a mount option.

With UCS-2LE encoding, we know every unicode character on the wire is two bytes
long but without unicode, we do not know how many btyes long is each character,
server could be using multi-byte/variable bytesize charset.
So one option would be to use another mount option codepage to specify
the charset used at the server and use the charset specified with the
codepage mount option.  If server does not support unicode and codepage
option is not specified, consider utf8 as a default charset on the server.

I think Jeff Layton had proposed such ideas/solution before and probably has/ha
some related code also, not sure, will try to look for that code.


On Tue, Sep 30, 2014 at 1:38 PM, adcromitus <adcromitus@xxxxxxxxx> wrote:
> Hy,
>
> I tried creating a new file, and as I write it correctly, it appears garbled
> in "ls".
>
> So, I hadn't though about this, but he files were created locally in a FAT32
> volume.
> I did some research and it appears the most probable cp is 1252 (I'm in
> Portugal). But I got this error:
>
> # mount -t cifs //192.168.1.253/Disk_a1 /shared/ahgora --verbose -o
> user=user,pass="",uid=1000,gid=1000,iocharset=cp1257
> mount.cifs kernel mount options:
> ip=192.168.1.253,unc=\\192.168.1.253\Disk_a1,iocharset=cp1257,uid=1000,gid=1000,user=user,pass=********
> mount error(79): Can not access a needed shared library
> Refer to the mount.cifs(8) manual page (e.g. man mount.cifs)
>
> I also tried cp860 and cp 1250, but the accented characters were kept as
> weird symbols.
>
> I didn't quite get how iocharset would solve this, as by you're experiment,
> the server seems not to support UTF.
>        iocharset
>            Charset used to convert local path names to and from Unicode.
> Unicode is used by default for network path names if the server supports
>            it. If iocharset is not specified then the nls_default specified
> during the local client kernel build will be used. **If server does not
>            support Unicode, this parameter is unused**.
>
>
> Since smbclient seems to be doing the conversion correctly, isn't it
> possible to just "ask it" which conversion it is doing?
>
> Thanks
>
>
>
>
> On 30/09/2014 03:40, Steve French wrote:
>>
>> I did some experiments:
>>
>> Took a Samba server, and set "unicode=false" in smb.conf
>>
>> mounted to the server from cifs and verified that Unicode is not being
>> sent
>>
>> created some files locally with Spanish characters in the directory
>> "test" and as expected the special characters were mapped to '?' (see
>> the ls of /mnt1/test)
>>
>> sfrench@ubuntu:/mnt1/test1$ ls ~/test/*a*b*
>> /home/sfrench/test/123áaébícódúeüfñg¿h¡
>> /home/sfrench/test/áaébícódúeüfñg¿h¡
>> sfrench@ubuntu:/mnt1/test1$ ls ~/test1
>> 123├ía├®b├¡c├│d├║e├╝f├▒g┬┐h┬í  ├ía├®b├¡c├│d├║e├╝f├▒g┬┐h┬í
>> sfrench@ubuntu:/mnt1/test1$ ls /mnt1/test/*a*b*
>> /mnt1/test/123?a?b?c?d?e?f?g?h?  /mnt1/test/?a?b?c?d?e?f?g?h?
>> sfrench@ubuntu:/mnt1/test1$ ls /mnt1/test1/
>> 123áaébícódúeüfñg¿h¡  áaébícódúeüfñg¿h¡
>>
>> unmounted and the mounted with "iocharset=cp850" on the client.
>> Created the files over the remote mount in /mnt1/test1 and it worked
>> fine and the Spanish characters were visible (locally in ~/test1 those
>> same filenames are not easily visible since the characters map
>> differently).
>>
>> So ... it looks like if files were created on a mount with the right
>> code page (iocharset=cp850 in my case) then you should be able to
>> create and read them fine remotely.
>>
>> On Mon, Sep 29, 2014 at 8:33 PM, Steve French <smfrench@xxxxxxxxx> wrote:
>>>
>>> To clarify - we need to experiment with setting "unicode=false" in a
>>> normal Samba server's smb.conf and experiment with client mount
>>> options to see if it can be reproduced
>>>
>>> On Mon, Sep 29, 2014 at 8:32 PM, Steve French <smfrench@xxxxxxxxx> wrote:
>>>>
>>>> First strange thing is why isn't the server negotiating Unicode - that
>>>> is unusual these days
>>>>
>>>> Negotiating unicode (UCS-2) the way like most every other server would
>>>> avoid this issue
>>>>
>>>> Looking at the trace we are not setting the Unicode flag on SMB
>>>> FindFirst
>>>>
>>>> presumably because it was not offered at SMB tree connect time.  We
>>>> always set it in the normal case when the server supports Unicode (see
>>>> below)
>>>>
>>>> 265         if (treeCon->ses) {
>>>> 266             if (treeCon->ses->capabilities & CAP_UNICODE)
>>>> 267                 buffer->Flags2 |= SMBFLG2_UNICODE;
>>>>
>>>>
>>>>
>>>> So without Unicode we have to set the code page manually.  The server
>>>> is way too old (10 years?) for us to mount smb2 (which would force
>>>> unicode on the wire) or to use Unix Extensions (which probably
>>>> requires at least 3.0 Samba to be useful).
>>>>
>>>> Haven't tried iocharset and codepage mount options recently
>>>> (presumably the way to experiment with this is to turn off Unicode in
>>>> Samba smb.conf via unicode=false)
>>>>
>>>> On Mon, Sep 29, 2014 at 5:50 PM, adcromitus <adcromitus@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> On 28/09/2014 09:32, steve wrote:
>>>>>>
>>>>>> On 28/09/14 01:23, adcromitus wrote:
>>>>>>>
>>>>>>> Hello again,
>>>>>>>
>>>>>>> Sorry for the long time to reply.
>>>>>>>
>>>>>>> I've been going around on how to do this. I set up Wireshark and saw
>>>>>>> what the server was transmitting. However I'm not really sure about
>>>>>>> what
>>>>>>> I should send here.
>>>>>>>
>>>>>>> Anyway I did a "ls" on a dir with a file named "Coleção", and
>>>>>>> wireshar
>>>>>>> captured "cole \247 \243o". I send a few frames from tcpdump where
>>>>>>> that
>>>>>>> happens.
>>>>>>>
>>>>>>> How can I see if my distro defaults to UTF-8 on the client?
>>>>>>>
>>>>>>> I'm using:
>>>>>>> Linux kernel 3.2.0-4-amd64
>>>>>>> (Debian Wheezy)
>>>>>>> mount.cifs version: 5.5
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 22/09/2014 04:28, Steve French wrote:
>>>>>>>>
>>>>>>>> This seems strange because modern Linux distributions should map
>>>>>>>> UCS-2
>>>>>>>> (16 bit Unicode characters which cifs servers like Windows and Samba
>>>>>>>> send over the wire) fine to UTF-8 which is the typical default one
>>>>>>>> for
>>>>>>>> local.
>>>>>>>>
>>>>>>>> Does you distro not default to UTF-8 on the client?
>>>>>>>>
>>>>>>>> Would be helpful to see a wire trace (ethereal or tcpdump) and make
>>>>>>>> sure the server is sending UCS-2 (Unicode) on the wire.  See
>>>>>>>> https://wiki.samba.org/index.php/LinuxCIFS_troubleshooting
>>>>>>>>
>>>>>>>> On Sat, Sep 20, 2014 at 5:44 PM, adcromitus <adcromitus@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hy,
>>>>>>>>>
>>>>>>>>> I'm not sure of what can be relevant so I'll tell the whole story.
>>>>>>>>>
>>>>>>>>> I have a router (that I got from my ISP) which allows the
>>>>>>>>> connection
>>>>>>>>> of a
>>>>>>>>> pen/HDD by USB. That pen is shared on the network as a Windows
>>>>>>>>> Share
>>>>>>>>> folder.
>>>>>>>>>
>>>>>>>>> In Windows 7 I can see all the files name correctly, but when I
>>>>>>>>> mount
>>>>>>>>> the
>>>>>>>>> drive in Linux, with the command:
>>>>>>>>>
>>>>>>>>> mount -t cifs //<local share ip-address>/<shared-folder> --verbose
>>>>>>>>> -o
>>>>>>>>> user=user,pass="",uid=1000,gid=1000
>>>>>>>>>
>>>>>>>>> (there is no password)
>>>>>>>>>
>>>>>>>>> All file names with special characters (like Çãõé...) have a
>>>>>>>>> question
>>>>>>>>> mark
>>>>>>>>> in place of the accented character and I can't open the file or
>>>>>>>>> folder, as
>>>>>>>>> any command responds the file doesn't exist. This happens in
>>>>>>>>> dolphin,
>>>>>>>>> thunar
>>>>>>>>> and in the command line with simple commands like cat.
>>>>>>>>>
>>>>>>>>> I tried adding the following option without success
>>>>>>>>>
>>>>>>>>> iocharset=utf-8
>>>>>>>>> iocharset=utf-8,codepage=cp437
>>>>>>>>> iocharset=utf-8,codepage=cp850
>>>>>>>>> iocharset=iso8859-1
>>>>>>>>>
>>>>>>>>> This also happens if I access the share from my android device, so
>>>>>>>>> I
>>>>>>>>> was
>>>>>>>>> convinced it was a problem related to old firmware (from the
>>>>>>>>> router).
>>>>>>>>>
>>>>>>>>> However, recently I connected to the drive using smbclient and the
>>>>>>>>> file
>>>>>>>>> names appeared correctly. I would like to mount this share folder
>>>>>>>>> at
>>>>>>>>> fstab,
>>>>>>>>> and so smbclient is not a good solution.
>>>>>>>>>
>>>>>>>>> I'm using:
>>>>>>>>> Linux kernel 3.2.0-4-amd64
>>>>>>>>> (Debian Wheezy)
>>>>>>>>> mount.cifs version: 5.5
>>>>>>>>>
>>>>>>>>> And I get this information from smbclient -L <local share ip
>>>>>>>>> address>:
>>>>>>>>> (smbclient version 4.1.11-Debian)
>>>>>>>>> Server=[Samba 2.2.12]
>>>>>>>>>
>>>>>>>>> So. Is there something else I can try?
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>
>>>>>> Hi
>>>>>> Probably an old cifs-utils? We have 6.2 with Spanish:
>>>>>> steve2@altet:~> ls
>>>>>> aviñón
>>>>>> barça
>>>>>>
>>>>>> HTH,
>>>>>> Steve
>>>>>
>>>>>
>>>>> Hy Steve,
>>>>>
>>>>> So I used chroot to install the cifs-utils version from Debian next
>>>>> release
>>>>> (cifs-utils v.6.4), and the result was the same as with my current
>>>>> version.
>>>>>
>>>>> Does the tcpdump helped in any way?
>>>>>
>>>>> Thanks again.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> linux-cifs"
>>>>>>>>> in
>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>>
>>>> Steve
>>>
>>>
>>>
>>> --
>>> Thanks,
>>>
>>> Steve
>>
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux