On Sat, Jul 01, 2006 at 06:23:07PM -0400, Tom Lane wrote: > "Tomi NA" <hefest@xxxxxxxxx> writes: > > Basically, it comes down to three possibilities, doesn't it: > > 1.) use an existing library > > 2.) write a pgsql specific implementation > > 3.) forget about it and tend to other issues > > > Personally, I don't really care if it's 1) or 2): I'm just afraid it's > > going to be 3). > > Is this a licencing issue (with regard to ICU beeing under the IBM > > public licence)? > > Licensing is a concern --- IBM's appears to be not quite BSD enough. > Size and portability of the library are concerns. Performance is a > concern. Whether the patch makes the library required or optional is > a concern (if required, the portability issue becomes a whole lot more > urgent). Loss of existing functionality is a concern --- for instance, > if the patch is such that UTF8 becomes the only supported server > encoding, it'll probably be rejected forthwith. Licence - It's the X/MIT licence, which is almost identical to the BSD licence. http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/icu/license.html http://en.wikipedia.org/wiki/MIT_License But I don't think anyone is actually considering importing ICU into the postgres source tree, are they? Size - I'm not sure this is relevent since I don't think we want to incorporate it into postgres itself, just let people use it if they have it. In any case though, the default dataset is 8MB. This includes support for every locale and charset it knows about. If you drop the conversion stuff (because postgres already has that) you're down to about 4MB. Since ICU supports userdefined tables, we could provide a single cross-platform dataset and get the user's ICU library implementation to use that. Portability - ICU runs on all the platforms postgres does, AFAICS. http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/readme.html?rev=release-3-4#HowToBuildSupported Performance - ICU is approximatly four times faster than glibc for collation. Even once you include keygen time (including conversion) it comes out about 40% faster. http://icu.sourceforge.net/charts/collation_icu4c_glibc.html ICU is not slow. > Well, the Japanese think that UTF8 is not the solution to all their > worries, so they won't be happy with a UTF8-only solution. Likewise, > those of us who only need single-byte character sets won't be very happy > with being forced to accept multi-byte processing overhead. I've not quite understood the japenese problem with Unicode. My understanding is that it was primarily due to widespread use of broken converters. In any case, ICU appears to beat glibc with single byte encodings, even including the multi-byte conversion. However, the most important point is that people have said they'll take the speed hit if they could get consistant collation. For speed you can always throw more hardware. But no amount of hardware will fix your collation issues. Have a nice day, -- Martijn van Oosterhout <kleptog@xxxxxxxxx> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment:
signature.asc
Description: Digital signature