Re: tiering: emergency demotions

Milind Changire <mchangir@xxxxxxxxxx> · Thu, 13 Oct 2016 17:23:48 +0530

Dilemma:
*without* my patch, the demotions in degraded (hi-watermark breached)
mode happen every 10 seconds by listing *all* files colder than the
last 10 seconds and sorting them in ascending order w.r.t. the
(write,read) access time ... so the existing query could take more than
a minute to list files if there are millions of them

*with* my patch we currently select a random set of 20 files and demote
them ... even if they are actively used ... so we either wait for more
than a minute for the exact listing of cold files in the worst case or
trade off by demoting hot files without imposing a file selection
criteria for a quicker turnaround time

The exponential time window schema to select files discussed over Google 
Hangout has an issue with deciding the start time of the time window, 
although we know the end time being the current time

So, I think it would be either of the strategies discussed above with a 
trade-off in one way or the other.

Comments are requested regarding the approach to take for the
implementation.

Rafi has also suggested to avoid file creation on the hot tier if the
hot tier has hi-watermark breached to avoid further stress on storage
capacity and eventual file migration to the cold tier.

Do we introduce demotion policies like "strict" and "approximate" to
let user choose the demotion strategy ?
1. strict
   Choosing this strategy could mean we wait for the full and ordered
   query to complete and only then start demoting the coldest file first

2. approximate
   Choosing this strategy could mean we choose the the first available
   file from the database query and demote it even if it is hot and
   actively written to

Milind

On 08/12/2016 08:25 PM, Milind Changire wrote:
Patch for review: http://review.gluster.org/15158

Milind

On 08/12/2016 07:27 PM, Milind Changire wrote:
On 08/10/2016 12:06 PM, Milind Changire wrote:
Emergency demotions will be required whenever writes breach the
hi-watermark. Emergency demotions are required to avoid ENOSPC in case
of continuous writes that originate on the hot tier.

There are two concerns in this area:

1. enforcing max-cycle-time during emergency demotions
   max-cycle-time is the time the tiering daemon spends in promotions or
   demotions
   I tend to think that the tiering daemon skip this check for the
   emergency situation and continue demotions until the watermark drops
   below the hi-watermark

Update:
To keep matters simple and manageable, it has been decided to *enforce*
max-cycle-time to yield the worker threads to attend to impending tier
management tasks if the need arises.

2. file demotion policy
   I tend to think that evicting the largest file with the most recent
   *write* should be chosen for eviction when write-freq-threshold is
   NON-ZERO.
   Choosing a least written file is just going to delay file migration
   of an active file which might consume hot tier disk space resulting
   in a ENOSPC, in the worst case.
   In cases where write-freq-threshold are ZERO, the most recently
   *written* file can be chosen for eviction.
   In the case of choosing the largest file within the
   write-freq-threshold, a stat() on the files would be required to
   calculate the number of files that need to be demoted to take the
   watermark below the hi-watermark. Finding the number of most recently
   written files to demote could also help make demotions in parallel
   rather than in the sequential manner currently in place.

Update:
The idea of choosing the files wrt file size has been dropped.
Iteratively, the most recently written file will be chosen for eviction
from the hot tier in case of a hi-watermark breach and until the
watermark drops below hi-watermark.
The idea of parallelizing multiple promotions/demotions has been
deferred.

-----

Sustained writes creating larges files in the hot tier which
cumulatively breach the hi-watermark does NOT seem to be a good
workload for making use of tiering. The assumption is that, to make the
most of of the hot tier, the hi-watermark would be closer to 100.
In this case a sustained large file copy might easily breach the
hi-watermark and may even consume the entire hot tier space, resulting
in a ENOSPC.

eg. an example of a sustained write

# cp file1 /mnt/glustervol/dir

Workloads that would seem to make the most of tiering are:
1. Many smaller files, which are created in small bursts of write
   activity and then closed
2. Few large files where updates are in-place and the file size
   does not grow beyond the hi-watermark eg. database, with frequent
   in-line compaction/de-fragmentation policy enabled
3. Frequent reads of few large files, mostly static in size, which
   cumulatively don't breach the hi-watermark. Frequently reading
   a large number of smaller, mostly static, files would be good
   tiering workload candidates as well.

Comments are requested.

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel