On Sun, 19 Feb 2023 15:13:23 -0600 Kevin Monceaux <Kevin@xxxxxxxxxxxxxx> wrote: > On Sun, Feb 19, 2023 at 08:31:57AM -0500, James K. Lowden wrote: > > > Can you give me a sense of the kind of information stored in that > > column? > > I tend to use unlimited character varying fields for text fields that > might contain long-ish strings, though not necessarily ridiculously > long strings. > > The database I was wanting to try out with OCESQL is one loaded with > some basic title details from the IMDb's title basics data set > available at: > > https://datasets.imdbws.com/ > > The description of the title.basics.tsv.gz file doesn't specify how > long any of the string columns might be, so I defined them as > unlimited character varying columns. The longest column is the > primaryTitle column. At the moment, the longest primaryTitle in the > data I have loaded is 419 characters. Excellent, Kevin, thanks. That's a clear description of a concrete problem, and good example, too. Here, IMDB does not specify an upper limit. In theory, the plot summary for "Bird" could include the history of music back to the Mesozioc. And short titles went out of fashion with James Michener. But to see the data, they have to be displayed somewhere, and that place has some finite size. Even if you're producing HTML and can put "unlimited" data in a field, you still have to read it into some buffer N times, until you reach the end. I would define my database with a title of varchar(500). If a bigger title comes along, I would deal with it initially just by truncating the title, and later by altering the database. Because, you know, YAGNI. For plots, I would have a separate table with a foreign key back to the main table, something like; create table plots ( movie_id integer not NULL, ordinal integer not NULL, paragraph varchar(1000) not NULL, primary key (movie_id, ordinal), foreign key (movie_id) references (movies.movie_id) ); where "ordinal" is the paragraph number and varies from 0 to N, where N is roughly the plot size % 1000 +1 I might decide to break plot rows by sentence, so it's easier to display them one at a time. With that design, I could do something like: select movie_id, max(ordinal) as N from plots group by movie_id; to signal to the application how many plot paragraphs there are. I could fetch the first one by default and others on demand, or just join the tables to get everthing everywhere all at once. But with a known size. :-) By the way, I didn't invent this idea. Sybase and Microsoft use a similar techinque for the text of stored procedures, which Sybase first defined somewhere around 1988. HTH. --jkl