> -----Original Message----- > From: redhat-list-bounces@xxxxxxxxxx > [mailto:redhat-list-bounces@xxxxxxxxxx]On Behalf Of Ben Russo > Sent: Tuesday, November 16, 2004 3:03 PM > To: debian-user@xxxxxxxxxxxxxxxxxxxxxxxxxxx; redhat-list@xxxxxxxxxx > Subject: Big Filesystem and Compressed Filesystem? > > > (see below for long story background ) > > The last time I created a large HW RAID5 volume (1.6 TB) the > kernel was > unable to see all of it... If I create several smaller block devices > (like 400GB each) can LVM bind them together into a larger single > filesystem? ( I am aiming for 4-6 TB ) > > Is there anyway to greate a BIG robust random rw access > filesystem that > is transparently compressed and supports large files (up to several > gigabytes each? ) > Background info:..................................... > My employer has large amounts (10GB/day) of telecommunications > related billing data that in it's raw form is BIG (1-2GB ea) > flat > ascii text files with fixed record formats with > carriage returns. > You were probably dealing with a 32 bit kernel with 512 byte blocks and signed 32 bit integers for storing block numbers - maximum filesystem size would be around a terabyte. You could obviously pop it up to a larger block size (2048) to get up to around 4TB, you would still have a 2gb filesize limit. or upgrade to a 64 bit version of the kernel etc..... You also dont say what platform you are running on - Hardware and software. In any case, you should take, IMHO, a different tactic here. Your application does NOT require you to build a TB disk farm. You are being too complex in your approach to the problem. It actually leans more towards a circular ring buffer design which will be simpler to manage. (a K.I.S.S. solution) You want to keep 3 years of data, so divide the requirement up by months or quarters or some other unit of granularity of time and slice and dice it across multiple filesystems of reasonable size. For example , using smaller numbers and hardware..... Suppose I need to keep 3 years and 9 GB's of data online and all I can buy cheap are 4GB SCSI hard drives. The oldest data will be discarded as newer data is received to replace it. I need to leave room to compress files as I go. So I would buy Four 4 GB disk drives and a scsi controller. Disk One is year One (3 years ago) Disk two is year two (2 years ago) disk three is year three (1 year ago) disk four is current year. (now) Each disk is one 4GB filesystem. each disk has Twelve directories on it - January through December. (you can also put in subdirectories to organize by the DAY of the month or use Julian Days altogether) Each directory on disks one through three contains the compressed files on disk (by your own admission, data extractions of older data are rare.) the current year disk has compressed files for preceding Months and this months files (or this weeks files) are uncompressed if you wish. Filenames for your data should be in YYYYMMDD.SEQNUM format - ie date of creation, and a sequencing number system to tell you the order in the day (0-5000) (or TIME of Day (HH:MM:SS) that the file was created (to maintain chronological ordering of the files). A couple of Shell or Perl scripts and some ascii text files to hold state information (or MySQl tables ETC) and you are off on something that can maintain itself via cron run processes. (states - current month, current year, current filesystem, year one, year two, and year three filesystem pathnames. etc) So everyday you are loading data into the current months directory on the current year filesystem. When the current month changes, several things happen 1) All the files in the previous months directory are compressed. (Compress October 2004 directory) 2) All the files in the Year One disk "previous months" directory are purged. (purge October 2001 directory on Year one disk) 3) Update current month state files 4) If Current year in state file compared to year in date command indicates year has changed - year one disk now becomes current year disk, year two disk becomes year one disk, year three disk becomes year two disk, old current year disk becomes year three disk. Other advantages to this approach. You only have to full dump One Unit of granularities worth of data maximum. Your previous UNIT can be backup up once to DVD (< $200 for drive) in January/at switchover time to create archival copies. Your unit of granularity (in the example above was a year) should be set to the smaller of the maximum size your tape backup system can hold as a full continuous dump or One years worth of data. Just think of it all in terms of buckets to stick stuff in - one bucket to put new stuff in and emptying the oldest bucket as you go to crteate the next new bucket. -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list