Re: OCESQL and Data Types

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 19 Feb 2023 15:13:23 -0600
Kevin Monceaux <Kevin@xxxxxxxxxxxxxx> wrote:

> On Sun, Feb 19, 2023 at 08:31:57AM -0500, James K. Lowden wrote:
>  
> > Can you give me a sense of the kind of information stored in that
> > column?  
> 
> I tend to use unlimited character varying fields for text fields that
> might contain long-ish strings, though not necessarily ridiculously
> long strings.
> 
> The database I was wanting to try out with OCESQL is one loaded with
> some basic title details from the IMDb's title basics data set
> available at:
> 
>     https://datasets.imdbws.com/
> 
> The description of the title.basics.tsv.gz file doesn't specify how
> long any of the string columns might be, so I defined them as
> unlimited character varying columns.  The longest column is the
> primaryTitle column.  At the moment, the longest primaryTitle in the
> data I have loaded is 419 characters.

Excellent, Kevin, thanks.  That's a clear description of a concrete
problem, and good example, too.   

Here, IMDB does not specify an upper limit.  In theory, the plot
summary for "Bird" could include the history of music back to the
Mesozioc.  And short titles went out of fashion with James Michener. 

But to see the data, they have to be displayed somewhere, and that
place has some finite size.  Even if you're producing HTML and can put
"unlimited" data in a field, you still have to read it into some
buffer N times, until you reach the end.  

I would define my database with a title of varchar(500).  If a bigger
title comes along, I would deal with it initially just by truncating
the title, and later by altering the database.  Because, you know,
YAGNI.  

For plots, I would have a separate table with a foreign key back to the
main table, something like; 

create table plots (
	movie_id integer not NULL, 
	ordinal integer not NULL, 
	paragraph varchar(1000) not NULL, 
	primary key (movie_id, ordinal), 
	foreign key (movie_id) references (movies.movie_id)
);

where "ordinal" is the paragraph number and varies from 0 to N, where N
is roughly

	the plot size % 1000 +1

I might decide to break plot rows by sentence, so it's easier to
display them one at a time.  

With that design, I could do something like:

	select movie_id, max(ordinal) as N 
	from plots group by movie_id;

to signal to the application how many plot paragraphs there are.  I
could fetch the first one by default and others on demand, or just
join the tables to get everthing everywhere all at once.  But with a
known size.  :-)  

By the way, I didn't invent this idea.  Sybase and Microsoft use a
similar techinque for the text of stored procedures, which Sybase first
defined somewhere around 1988.  

HTH.

--jkl








[Index of Archives]     [Gcc Help]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Info]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux