On Tue, Jul 12, 2005 at 05:37:32PM -0400, Joe wrote: > Tom Lane wrote: > >Because the length specification is in *characters*, which is not by any > >means the same as *bytes*. > > > >We could possibly put enough intelligence into the low-level tuple > >manipulation routines to count characters in whatever encoding we happen > >to be using, but it's a lot faster and more robust to insist on a count > >word for every variable-width field. > > I guess what you're saying is that PostgreSQL stores characters in > varying-length encodings. It _may_ store characters in variable length encodings. It can use fixed-length encodings too, such as latin1 or plain ASCII (actually, unchecked 8 bits, which means about anything) -- you define that at initdb time or database creation time, I forget. It would be painful for the code to distinguish fixed-length from variable-length at runtime, an optimization that would allow getting rid of the otherwise required length word. So far, nobody has cared enough about it to do the job. > If it stored character data in Unicode (UCS-16) it would always take > up two-bytes per character. Really? We don't support UCS-16, for good reasons (we'd have to rewrite several parts of the code in order to support '0' bytes embedded in strings ... we use regular C strings extensively). However we do support Unicode as UTF-8, but it's been said a couple of times that characters can be wider than 2 or 3 bytes in some cases. So, I don't see how UCS-16 could always use only 2 bytes. > Have you considered supporting NCHAR/NVARCHAR, aka NATIONAL character > data? There have been noises, but so far nobody has stepped up the plate to do the work. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "Those who use electric razors are infidels destined to burn in hell while we drink from rivers of beer, download free vids and mingle with naked well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152) ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match