Re: [HACKERS] qsort again (was Re: Strange Create

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Has anybody got some profiler data on the amount of time spent in comparisons during a sort ? Say, the proposals here would give the most gains on simple types like INTEGER ; so it would be interesting to know how much time is now spent in comparisons for sorting a column of ints. If it's like, 10% of the total time, well...

	More hand-waving :

	What are the usage case for sorts ?

- potentially huge data sets : create index, big joins, reporting queries etc. - small data sets : typically, a query with an ORDER BY which will return a small amount of rows (website stuff), or joins not small enough to use a HashAggregate, but not big enough to create an index just for them.

The cost of a comparison vs. moving stuff around and fetching stuff is probably very different in these two cases. If it all neatly fits in sort_mem, you can do fancy stuff (like sorting on SortKeys) which will need to access the data in almost random order when time comes to hand the sorted data back. So, I guess the SortKey idea would rather apply to the latter case only, which is CPU limited.

Anyway, I was wondering about queries with multipart keys, like ORDER BY zipcode, client_name, date and the like. Using just an int64 as the key isn't going to help a lot here. Why not use a binary string of limited length ? I'd tend to think it would not be that slower than comparing ints, and it would be faster than calling each comparison function for each column. Each key part would get assigned to a byte range in the string. It would waste some memory, but for instance, using 2x the memory for half the time would be a good tradeoff if the amount of memory involved is in the order of megabytes. Also, it would solve the problem of locales. Comparisons involving locales are slow, but strings can be converted to a canonical form (removing accents and stuff), and then binary sorted.

Also I'll insert a plug for the idea that the Sort needs to know if there will be a LIMIT afterwards ; this way it could reduce its working set by simply discarding the rows which would have been discarded anyway by the LIMIT. Say you want the 100 first rows out of a million ordered rows. If the sort knows this, it can be performed in the amount of memory for a 100 rows sort.



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux