Search Postgresql Archives

Re: "Shared strings"-style table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 10/13/2017 09:49 AM, Seamus Abshere wrote:
hey,

In the spreadsheet world, there is this concept of "shared strings," a
simple way of compressing spreadsheets when the data is duplicated in
many cells.

In my database, I have a table with >200 million rows and >300 columns
(all the households in the United States). For clarity of development
and debugging, I have not made any effort to normalize its contents, so
millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
TOWNHOUSE" (yes, that whole string!) instead of some code representing
it.

Theoretically / blue sky, could there be a table or column type that
transparently handles "shared strings" like this, reducing size on disk
at the cost of lookup overhead for all queries?

(I guess maybe it's like TOAST, but content-hashed and de-duped and not
only for large objects?)

Thanks,
Seamus

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


What data type are these columns now? I would be tempted to map the full strings to an abbreviation just so I didn't have to alter all the columns to an "id"; Optional to place any RI on the columns to the abbreviation dictionary table. Just use the translation as a last step in user facing reports. If you can map/abbreviate to 4 characters, you've approximated the disk size of an integer.


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux