Dilemma:
*without* my patch, the demotions in degraded (hi-watermark breached)
mode happen every 10 seconds by listing *all* files colder than the
last 10 seconds and sorting them in ascending order w.r.t. the
(write,read) access time ... so the existing query could take more than
a minute to list files if there are millions of them
*with* my patch we currently select a random set of 20 files and demote
them ... even if they are actively used ... so we either wait for more
than a minute for the exact listing of cold files in the worst case or
trade off by demoting hot files without imposing a file selection
criteria for a quicker turnaround time
The exponential time window schema to select files discussed over Google
Hangout has an issue with deciding the start time of the time window,
although we know the end time being the current time
So, I think it would be either of the strategies discussed above with a
trade-off in one way or the other.
Comments are requested regarding the approach to take for the
implementation.
Rafi has also suggested to avoid file creation on the hot tier if the
hot tier has hi-watermark breached to avoid further stress on storage
capacity and eventual file migration to the cold tier.
Do we introduce demotion policies like "strict" and "approximate" to
let user choose the demotion strategy ?
1. strict
Choosing this strategy could mean we wait for the full and ordered
query to complete and only then start demoting the coldest file first
2. approximate
Choosing this strategy could mean we choose the the first available
file from the database query and demote it even if it is hot and
actively written to
Milind
On 08/12/2016 08:25 PM, Milind Changire wrote:
Patch for review: http://review.gluster.org/15158
Milind
On 08/12/2016 07:27 PM, Milind Changire wrote:
On 08/10/2016 12:06 PM, Milind Changire wrote:
Emergency demotions will be required whenever writes breach the
hi-watermark. Emergency demotions are required to avoid ENOSPC in case
of continuous writes that originate on the hot tier.
There are two concerns in this area:
1. enforcing max-cycle-time during emergency demotions
max-cycle-time is the time the tiering daemon spends in promotions or
demotions
I tend to think that the tiering daemon skip this check for the
emergency situation and continue demotions until the watermark drops
below the hi-watermark
Update:
To keep matters simple and manageable, it has been decided to *enforce*
max-cycle-time to yield the worker threads to attend to impending tier
management tasks if the need arises.
2. file demotion policy
I tend to think that evicting the largest file with the most recent
*write* should be chosen for eviction when write-freq-threshold is
NON-ZERO.
Choosing a least written file is just going to delay file migration
of an active file which might consume hot tier disk space resulting
in a ENOSPC, in the worst case.
In cases where write-freq-threshold are ZERO, the most recently
*written* file can be chosen for eviction.
In the case of choosing the largest file within the
write-freq-threshold, a stat() on the files would be required to
calculate the number of files that need to be demoted to take the
watermark below the hi-watermark. Finding the number of most recently
written files to demote could also help make demotions in parallel
rather than in the sequential manner currently in place.
Update:
The idea of choosing the files wrt file size has been dropped.
Iteratively, the most recently written file will be chosen for eviction
from the hot tier in case of a hi-watermark breach and until the
watermark drops below hi-watermark.
The idea of parallelizing multiple promotions/demotions has been
deferred.
-----
Sustained writes creating larges files in the hot tier which
cumulatively breach the hi-watermark does NOT seem to be a good
workload for making use of tiering. The assumption is that, to make the
most of of the hot tier, the hi-watermark would be closer to 100.
In this case a sustained large file copy might easily breach the
hi-watermark and may even consume the entire hot tier space, resulting
in a ENOSPC.
eg. an example of a sustained write
# cp file1 /mnt/glustervol/dir
Workloads that would seem to make the most of tiering are:
1. Many smaller files, which are created in small bursts of write
activity and then closed
2. Few large files where updates are in-place and the file size
does not grow beyond the hi-watermark eg. database, with frequent
in-line compaction/de-fragmentation policy enabled
3. Frequent reads of few large files, mostly static in size, which
cumulatively don't breach the hi-watermark. Frequently reading
a large number of smaller, mostly static, files would be good
tiering workload candidates as well.
Comments are requested.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel