Re: get_fs_excl/put_fs_excl/has_fs_excl

Jamie Lokier <jamie@xxxxxxxxxxxxx> · Mon, 27 Apr 2009 15:47:42 +0100

Theodore Tso wrote:
> *) Do we only care about processes whose I/O priority is below the
>    default?  (i.e., either in the idle class, or in a low-priority
>    best efforts class) What if the concern is a real-time process
>    which is being blocked by a default I/O priority process taking its
>    time while holding some fs-wide resource?
> 
>    If the answer to the previous question is no, it becomes more
>    reasonable to consider bump the submission priority of the process
>    in question to the highest priority "best efforts" level.  After
>    all, if this truly is a "filesystem-wide" resource, then no one is
>    going to make forward progress relating to this block device unless
>    and until the filesystem-wide lock is resolved.  Also, if we don't
>    allow this situation to return to userspace, presumably the
>    kernel-code involved will only be writing to the block-device in
>    question.  (This might not be entirely true if in the case of the
>    sendfile(2) syscall, but currently we can only read from
>    filesystems with sendfile, and so presumably a filesystem would
>    never call get_fs_excl why servicing a sendfile request.)
> 
> *) Is implementing the bulk of this in the cfq scheduler really the
>    best place to do this?  To explore something completely different,
>    what if the filesystem simply explicitly set I/O priority levels in
>    its block I/O submissions, and provided optional callback functions
>    which could be used by the page writeback routines to determine the
>    appropriate I/O priority level that should be used given a
>    particular filesystem and inode number.  (That actually could be
>    used to provide another cool function --- we could expose to
>    userspace the concept that particular inode should always have its
>    I/O go out with a higher priority, perhaps via chattr flag.)
> 
>    Basically, the argument here is that we already have the
>    appropriate mechanism for ordering I/O requests, which is I/O
>    priority mechanism, and the policy really needs to be set by the
>    filesystem --- and it might be far more than just "do we have a
>    filesystem-wide exclusive lock" or not.

Personally, I'm interested in the following:

    - A process with RT I/O priority and RT CPU priority is reading
      a series of files from disk.  It should be very reliable at this.

    - Other normal I/O priority and normal CPU priority processes are
      reading and writing the disk.

I would like the first process to have a guaranteed minimum I/O
performance: it should continuously make progress, even when it needs
to read some file metadata which overlaps a page affected by the other
processes.  I don't mind all the interference from disk head seeks and
so on, but I would like the I/O that the first process depends on to
have RT I/O priority - including when it's waiting on I/O initiated by
another process and the normal I/O priority queue is full.

So, I'm not exactly sure, but I think what I need for that is:

    - I/O priority boosting (re-queuing in the elevator) to fix the
      inversion when waiting on I/O which was previously queued with
      normal I/O priority, and

    - Task priority boosting when waiting on a filesystem resource
      which is held by a normal priority task.

(I'm not sure if generic task priority boosting is already addressed to some
extent in the RT-PREEMPT Linux tree.)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html