Re: [Lsf-pc] [LSF/MM ATTEND] Filesystems -- Btrfs, cgroups, Storage topics from Facebook

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2014-01-02 at 07:46 +-0100, Jan Kara wrote:
+AD4- On Tue 31-12-13 15:34:40, Chris Mason wrote:
+AD4- +AD4- On Tue, 2013-12-31 at 22:22 +-0800, Tao Ma wrote:
+AD4- +AD4- +AD4- Hi Chris,
+AD4- +AD4- +AD4- On 12/31/2013 09:19 PM, Chris Mason wrote:
+AD4- +AD4- +AD4-  
+AD4- +AD4- +AD4- +AD4- So I'd like to throttle the rate at which dirty pages are created,
+AD4- +AD4- +AD4- +AD4- preferably based on the rates currently calculated in the BDI of how
+AD4- +AD4- +AD4- +AD4- quickly the device is doing IO.  This way we can limit dirty creation to
+AD4- +AD4- +AD4- +AD4- a percentage of the disk capacity during the current workload
+AD4- +AD4- +AD4- +AD4- (regardless of random vs buffered).
+AD4- +AD4- +AD4- Fengguang had already done some work on this, but it seems that the
+AD4- +AD4- +AD4- community does't have a consensus on where this control file should go.
+AD4- +AD4- +AD4-  You can look at this link: https://lkml.org/lkml/2011/4/4/205
+AD4- +AD4- 
+AD4- +AD4- I had forgotten Wu's patches here, it's very close to the starting point
+AD4- +AD4- I was hoping for.
+AD4-   I specifically don't like those patches because throttling pagecache
+AD4- dirty rate is IMHO rather poor interface. What people want to do is to
+AD4- limit IO from a container. That means reads +ACY- writes, buffered +ACY- direct IO.
+AD4- So dirty rate is just a one of several things which contributes to total IO
+AD4- rate. When you have both direct IO +ACY- buffered IO happening in the container
+AD4- they influence each other so dirty rate 50 MB/s may be fine when nothing
+AD4- else is going on in the container but may be far to much for the system if
+AD4- there are heavy direct IO reads happening as well.
+AD4- 
+AD4- So you really need to tune the limit on the dirty rate depending on how
+AD4- fast the writeback can happen (which is what current IO-less throttling
+AD4- does), not based on some hard throughput number like
+AD4- 50 MB/s (which is what Fengguang's patches did if I remember right).
+AD4- 
+AD4- What could work a tad bit better (and that seems to be something you are
+AD4- proposing) is to have a weight for each memcg and each memcg would be
+AD4- allowed to dirty at a rate proportional to its weight +ACo- writeback
+AD4- throughput. But this still has a couple of problems:
+AD4- 1) This doesn't take into account local situation in a memcg - for memcg
+AD4-    full of dirty pages you want to throttle dirtying much more than for a
+AD4-    memcg which has no dirty pages.
+AD4- 2) Flusher thread (or workqueue these days) doesn't know anything about
+AD4-    memcgs. So it can happily flush a memcg which is relatively OK for a
+AD4-    rather long time while some other memcg is full of dirty pages and
+AD4-    struggling to do any progress.
+AD4- 3) This will be somewhat unfair since the total IO allowed to happen from a
+AD4-    container will depend on whether you are doing only reads (or DIO), only
+AD4-    writes or both reads +ACY- writes.
+AD4- 
+AD4- In an ideal world you could compute writeback throughput for each memcg
+AD4- (and writeback from a memcg would be accounted in a proper blkcg - we would
+AD4- need unified memcg +ACY- blkcg hieararchy for that), take into account number of
+AD4- dirty pages in each memcg, and compute dirty rate according to these two
+AD4- numbers. But whether this can work in practice heavily depends on the memcg
+AD4- size and how smooth / fair can the writeback from different memcgs be so
+AD4- that we don't have excessive stalls and throughput estimation errors...

+AFs- Adding Tejun, Vivek and Li from another thread +AF0-

I do agree that a basket of knobs is confusing and it doesn't really
help the admin.

My first idea was a complex system where the controller in the block
layer and the BDI flushers all communicated about current usage and
cooperated on a single set of reader/writer rates.  I think it could
work, but it'll be fragile.

But there are a limited number of non-pagecache methods to do IO.  Why
not just push the accounting and throttling for O+AF8-DIRECT into a new BDI
controller idea?  Tejun was just telling me how he'd rather fix the
existing controllers than add a new one, but I think we can have a much
better admin experience by having a having a single entry point based on
BDIs.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux