Re: [Lsf-pc] [LSF/MM ATTEND] Filesystems -- Btrfs, cgroups, Storage topics from Facebook

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2013-12-31 at 13:45 +-0100, Jan Kara wrote:
+AD4- On Tue 31-12-13 16:49:27, Zheng Liu wrote:
+AD4- +AD4- Hi Chris,
+AD4- +AD4- 
+AD4- +AD4- On Mon, Dec 30, 2013 at 09:36:20PM +-0000, Chris Mason wrote:
+AD4- +AD4- +AD4- Hi everyone,
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- I'd like to attend the LSF/MM conference this year.  My current
+AD4- +AD4- +AD4- discussion points include:
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- All things Btrfs+ACE-
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- Adding cgroups for more filesystem resources, especially to limit the
+AD4- +AD4- +AD4- speed dirty pages are created.
+AD4- +AD4- 
+AD4- +AD4- Interesting.  If I remember correctly, IO-less dirty throttling has been
+AD4- +AD4- applied into upstream kernel, which can limit the speed that dirty pages
+AD4- +AD4- are created.  Does it has any defect?
+AD4-   It works as it should. But as Jeff points out, the throttling isn't
+AD4- cgroup aware. So it can happen that one memcg is full of dirty pages and
+AD4- reclaim has problems with reclaiming pages for it. I guess what Chris asks
+AD4- for is that we watch number of dirty pages in each memcg and throttle
+AD4- processes creating dirty pages in memcg which is close to its limit on
+AD4- dirty pages.

Right, the ioless dirty throttling is fantastic, but it's based on the
BDI and you only get one of those per device.

The current cgroup IO controller happens after we've decided to start
sending pages down.  From a buffered write point of view, this is
already too late.  If we delay the buffered IOs, the higher priority
tasks will just wait in balance+AF8-dirty+AF8-pages instead of waiting on the
drive.

So I'd like to throttle the rate at which dirty pages are created,
preferably based on the rates currently calculated in the BDI of how
quickly the device is doing IO.  This way we can limit dirty creation to
a percentage of the disk capacity during the current workload
(regardless of random vs buffered).

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux