On 2004.02.20 03:32, Andrew Morton wrote: > Nick Piggin <piggin@cyberone.com.au> wrote: > > > > > > > > Andrew Morton wrote: > > > > >Nick Piggin <piggin@cyberone.com.au> wrote: > > > > > >>Even with this patch, it might still be a good idea to allow > > >>pdflush to disregard the limits... > > >> > > > > > >Has it been confirmed that pdflush is blocking in get_request_wait()? I > > >guess that can happen very occasionally because we don't bother with any > > >locking around there but if it's happening a lot then something is bust. > > > > > > > > > > Miquel's analysis is pretty plausible, but I'm not sure if > > he's confirmed it or not, Miquel? Yes, I've added current->comm to my debug printk's, and it's dd/pdflush that both block in get_request_wait (alternating between the two). > > Even if it isn't happening > > a lot, and something isn't bust it might be a good idea to > > do this. > > Seems OK from a quick check. pdflush will block in get_request_wait() > occasionally, but not at all often. Perhaps we could move the > write_congested test into the mpage_writepages() inner loop but it hardly > seems worth the risk. > > Maybe things are different on Miquel's clockwork controller. I haven't tested it yet because of the "This patch isn't actually so good" comment, but I found another explanation. > drivers/block/ll_rw_blk.c | 2 ++ > fs/fs-writeback.c | 2 ++ > 2 files changed, 4 insertions(+) *Lightbulb on* .. I just read fs-writeback.c. As I said, this happens with an LVM device. Could it be that because LVM and the actual device have different struct request_queue's things go awry ? In fs-writeback.c, your're looking at the LVM device (and its request_queue, and its backing_dev_info). In__make_request, you're looking at the SCSI device. So that's why I saw "dd" and pdflush fighting over get_request. pdflush isn't supposed to run when the blockdev is congested, but ofcourse the LVM device is never marked as congested. The same can happen with MD devices, I think. Here's another proof-of-concept patch that fixes things for LVM. With this patch applied, I never see pdflush running when the queue is congested. You'll probably mark it as "horrible", but hey, it shows the problem clear enough.. This patch adds a function pointer to the backing_dev_info struct that can be called to check on the congestion if it's anything other than a simple device. Dm.c now sets this pointer to a function defined in dm-table.c that checks the congestion bits on all devices part of the logical volume. I'm not sure if it should be the other way around (ll_rw_blk setting the congested bit in the LVM device instead). Also if any queue of any device under LVM is congested, pdflush won't run on the whole device - but I think that harms less than running on a device with a congested queue. bdi-congestion-funp.patch --- linux-2.6.3/include/linux/backing-dev.h.ORIG 2004-02-04 04:43:38.000000000 +0100 +++ linux-2.6.3/include/linux/backing-dev.h 2004-02-20 15:17:52.000000000 +0100 @@ -24,6 +24,8 @@ unsigned long ra_pages; /* max readahead in PAGE_CACHE_SIZE units */ unsigned long state; /* Always use atomic bitops on this */ int memory_backed; /* Cannot clean pages with writepage */ + int (*congested)(int, void *); /* Function pointer if device is md/dm */ + void *aux; /* Pointer to aux data for congested func */ }; extern struct backing_dev_info default_backing_dev_info; @@ -34,11 +36,15 @@ static inline int bdi_read_congested(struct backing_dev_info *bdi) { + if (bdi->congested) + return bdi->congested(BDI_read_congested, bdi->aux); return test_bit(BDI_read_congested, &bdi->state); } static inline int bdi_write_congested(struct backing_dev_info *bdi) { + if (bdi->congested) + return bdi->congested(BDI_write_congested, bdi->aux); return test_bit(BDI_write_congested, &bdi->state); } --- linux-2.6.3/drivers/md/dm.h.ORIG 2004-02-20 15:15:16.000000000 +0100 +++ linux-2.6.3/drivers/md/dm.h 2004-02-20 15:12:30.000000000 +0100 @@ -115,6 +115,7 @@ int dm_table_get_mode(struct dm_table *t); void dm_table_suspend_targets(struct dm_table *t); void dm_table_resume_targets(struct dm_table *t); +int dm_table_any_congested(int bdi_state, void *aux); /*----------------------------------------------------------------- * A registry of target types. --- linux-2.6.3/drivers/md/dm-table.c.ORIG 2004-02-04 04:44:59.000000000 +0100 +++ linux-2.6.3/drivers/md/dm-table.c 2004-02-20 15:14:35.000000000 +0100 @@ -857,6 +857,30 @@ } } +/* + * See if any device in the logical volume has a queue + * that is congested - in that case, pdflush skips it. + */ +int dm_table_any_congested(int bdi_state, void *aux) +{ + struct mapped_device *md = aux; + struct dm_table *t; + struct list_head *d, *devices; + int r = 0; + + if ((t = dm_get_table(md)) == NULL) + return 0; + + devices = dm_table_get_devices(t); + for (d = devices->next; d != devices; d = d->next) { + struct dm_dev *dd = list_entry(d, struct dm_dev, list); + request_queue_t *q = bdev_get_queue(dd->bdev); + r |= test_bit(bdi_state, &(q->backing_dev_info.state)); + } + dm_table_put(t); + + return r; +} EXPORT_SYMBOL(dm_get_device); EXPORT_SYMBOL(dm_put_device); --- linux-2.6.3/drivers/md/dm.c.ORIG 2004-02-20 15:15:02.000000000 +0100 +++ linux-2.6.3/drivers/md/dm.c 2004-02-20 15:14:51.000000000 +0100 @@ -608,6 +608,8 @@ } md->queue->queuedata = md; + md->queue->backing_dev_info.congested = dm_table_any_congested; + md->queue->backing_dev_info.aux = md; blk_queue_make_request(md->queue, dm_request); md->io_pool = mempool_create(MIN_IOS, mempool_alloc_slab, _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/