Re: optimizing space for array of booleans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 24, 2009 at 3:24 AM, leledumbo <leledumbo_cool@xxxxxxxxxxx> wrote:
>
> Just tried serializing array of 256 booleans and printing the length, it
> really shocked me: 2458. This project will be used by about 500 students, so
> in the worst case (all students enroll all courses) it will eat 500 * 2458
> (assuming one character eats one byte) = 1229000 Bytes ~= 1.2 MB. Not a big
> deal, eh?
>

That is a very un-normalized way of storing associations between
entities in a database. You describe a "worst case scenario" where
every student is registered for every course. If you serialize an
array, every bit takes the same amount of space so it doesn't matter
how many courses each student has. If you convert this to a bitmask as
you are describing, your worst case would happen very frequently - at
least as often as students are assigned to the course whose place is
the most significant bit value. What's more, if you later need to add
a 257th course, you could have lots of work to do to adjust your code
to deal with the extra value.

Generally relationships like the one you describe are stored in three
separate and related tables: Students, Courses, and Enrollment. The
latter is a n:m association between the first two. The advantage this
approach has with regard to storage is that it is a sparse matrix.
Most students will only enroll in a handful of courses. With this
approach, you only store a row in the Enrollment table if a student is
enrolled in a course. Otherwise, you don't need to store anything.
Granted, if you use 4-byte integers for the primary keys in both the
Students and Courses tables, your data storage would be about 2048
bytes for the data "worst case scenario" that you described, and
another 2048 bytes for a unique index that would insure that each
student could be enrolled in any course no more than one time.
However, since I highly doubt most students are going to be enrolled
in every course, it is not likely that you'll approach the 1.2MB
storage space that you're worried about. Each row would take 16 bytes
to store both the data and the index; if each student is enrolled in
an average of 4 courses, it should only take an average of 64 bytes
per student. For 500 students, that works out to around 2000 rows at
about 32KB overall.

For comparison, I have a fairly small table in a system we recently
put into production that maintains similar associations. It stores
more information in each row than just a two-column association (about
17 columns) and currently has over 7400 rows. The data is 0.734MB and
the total size of three indexes is 0.820MB.

And no, 1.2MB is not that big of a deal. A single floppy high-density
drive could hold that much data 18 years ago!

Andrew

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux