Hello, I have been looking into reducing the time gluster spends on tier migration by compacting the databases. Gluster creates a SQLite database on each brick to collect metadata for files the client(s) touch. This metadata is necessary for tier migration. At regular intervals, gluster queries the database on each brick to determine which files to move. As the database file size increases, so does the time to query the database. This is due to fragmentation [1]. Therefore, migration slows down as time goes on. Solution: We are asking for feedback on which defragmentation method to use for the SQLite database. We detail all the options below. Currently, we are leaning towards using the incremental auto_vacuum option tuned to removing all free pages. In our tests so far, this option saves us about as much space as manually calling VACUUM, but is faster overall. Link to current progress: http://review.gluster.org/15031 Vacuum Types: - The "VACUUM" option will reorganize the database by inserting all the data into a new database and copying the contents back into the old one. This places all used pages from the same tables next to each other and any free pages at the end. During the reorganization, nothing else can edit the database. Therefore, no client can add new metadata to the database and tier updates will not happen. At worst, this command will use twice the space of the original database while defragmentation is underway. - The "auto_vacuum" option comes in two flavors, "full" and "incremental". A full auto_vacuum moves ALL free pages in the database to the end of the file after every commit. To do this, sqlite keeps some extra metadata in the file to track candidates for deframentation. However, full auto_vacuum does not elimintate all fragmentation because there is no guarantee that data from the same table will remain next to each other after the free page is moved. In fact, this can make fragmentation worse. However if there are no free pages to move, this option is a no-op (unlike the "vacuum" option which always does a full copy of the database) - "Incremental" auto_vacuum removes N free pages from the file, where N is user-specified. While called an "auto_vacuum", this version will only remove the free pages when invoked with a specific pragma, "incremental_vacuum(N)". Just like full auto_vacuum, sqlite stores extra metadata in the file to do this. As in auto vacuum, it also does not guarantee the elimination of fragmentation. However unlike full auto vacuum, freed pages are deleted, which will shrink the database size. Changes to Gluster: We are adding an option to gluster that activates an underlying database's compaction on or off. gluster volume tier <volname> tier-compact <off|on> At regular intervals, the tier daemon will send a compaction IPC to the bricks one at a time and compact the database according to the strategy. For SQLite, this will change the necessary pragmas and call VACUUM or incremental_vacuum(N) on the database as necessary. [1] SQLite divides its database file into blocks of 4K called pages. A page can either be free (unused), store data for a table, or store metadata for SQLite to use. As a system uses a database over time, free pages can appear between two used pages for some table A in the database. Pages for some other table B can also appear between those two pages for table A. In general, whenever any data not from table A appears between two pages for table A, we have database fragmentation. Fragmentation hurts database read times. A database without fragmentation benefits from sequential reads from disk and may evict table A's data from the cache. Regards, Diogenes _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel