On 1/20/12 2:34 PM, Jan Kara wrote: > vfs_check_frozen() tests are racy since the filesystem can be frozen just after > the test is performed. Thus in write paths we can end up marking some pages or > inodes dirty even though filesystem is already frozen. This creates problems > with flusher thread hanging on frozen filesystem. > > Another problem is that exclusion between ->page_mkwrite() and filesystem > freezing has been handled by setting page dirty and then verifying s_frozen. > This guaranteed that either the freezing code sees the faulted page, writes it, > and writeprotects it again or we see s_frozen set and bail out of page fault. > This works to protect from page being marked writeable while filesystem > freezing is running but has an unpleasant artefact of leaving dirty (although > unmodified and writeprotected) pages on frozen filesystem resulting in similar > problems with flusher thread as the first problem. > > This patch aims at providing exclusion between write paths and filesystem > freezing. We implement a writer-freeze read-write semaphores in the superblock > for each freezing level (currently there are two - SB_FREEZE_WRITE for data and > SB_FREEZE_TRANS for metadata). Write paths which should block freezing on given > level (e.g. ->block_page_mkwrite(), ->aio_write() for SB_FREEZE_WRITE level; > transaction lifetime for SB_FREEZE_TRANS level) hold reader side of the > semaphore. Code freezing the filesystem to a given level takes the writer side. > > Only that we don't really want to bounce cachelines of the semaphore between > CPUs for each write happening. So we implement the reader side of the semaphore > as a per-cpu counter and the writer side is implemented using s_frozen > superblock field. > > Acked-by: "Theodore Ts'o" <tytso@xxxxxxx> > Signed-off-by: Jan Kara <jack@xxxxxxx> ... > @@ -135,6 +157,11 @@ static struct super_block *alloc_super(struct file_system_type *type) > #else > INIT_LIST_HEAD(&s->s_files); > #endif > + if (init_sb_writers(s, SB_FREEZE_WRITE, "sb_writers_write")) > + goto err_out; > + if (init_sb_writers(s, SB_FREEZE_TRANS, "sb_writers_trans")) > + goto err_out; > + > s->s_bdi = &default_backing_dev_info; > INIT_LIST_HEAD(&s->s_instances); > INIT_HLIST_BL_HEAD(&s->s_anon); > @@ -186,6 +213,17 @@ static struct super_block *alloc_super(struct file_system_type *type) > } > out: > return s; > +err_out: > + security_sb_free(s); > +#ifdef CONFIG_SMP > + if (s->s_files) > + free_percpu(s->s_files); > +#endif > + destroy_sb_writers(s, SB_FREEZE_WRITE); > + destroy_sb_writers(s, SB_FREEZE_TRANS); You probably ran into this already but the writer percpu vars need to be torn down in destroy_super() as well. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html