Re: [PATCH]QNX6 filesystem (RO) driver

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Wed, 15 Feb 2012 06:10:12 +0000

On Wed, Feb 15, 2012 at 03:52:01AM +0100, Kai Bankett wrote:
> After successfully switching find_enty() to pagemap I spend quite
> some time on switching long_match() to pagemap cache.
> At the end I gave up. Things just became too complicated.

Not if you have a separate struct inode for /.longfile - then it's
simply read_cache_page() + kmap().

> Far more complicated than bh stuff.
> At least if I get no further clue on how to manage the different
> pagesizes efficient - compared to the very efficient bh code - I
> cannot see light at the end of the tunnel.
> For long Filenames, the Filename always is stored in a seperate
> block. Whatever I tried, I did not get the pagemap code as
> efficient, as it abstracts from blocksize. (that's where it got it's
> stength - IHMO)
> Ok, I could save the block adressing steps, but I will add so many
> other steps on the way ... I am really unsure if that pays off at
> the end. At least code complexity rises extremely.

You do realize that buffer cache does _not_ exist separately from
pagecache, right?  It's just the pagecache of underlying block
device; see __find_get_block_slow() for what really happens behind
the scene.

The difference between page and buffer cache is not in abstracting the
size away; that's trivial.  The real difference is that one is that
between virtually-indexed and physically-indexed caches.  Essentially,
qnx6_get_block() is address translation for that sucker.  And buffer_head
is just a TLB entry.

Think of struct address_space as MMU context; we have one for identity
mapping of block device and what you are doing is manually asking for
address translation and feeding the resulting physical address into
that identity mapped context.  You could use the address spaces of
directory and /.longfiles resp. instead and let the normal logics to
take care of things.  Moreover, you get address translation cached
that way - no need to recalculate physical block addresses every
time.  That map_bh() in qnx6_get_block() is there exactly for that
reason...

Block size is not an issue, really - readdir() should just map pages
one by one and loop over entries in those; when it hits longname,
it would just do read_cache_page() on longfiles inode and look into
it; that's all.  I.e. you need something like

struct qnx6_long_filename *qnx6_longname(sbi, de, p)
{
	u32 s = fs32_to_cpu(sbi, de->de_long_inode); /* in 512-byte units */
	u32 n = s >> (PAGE_CACHE_SHIFT - 9);	/* in pages */
	u32 offs = (s << 9) & ~PAGE_CACHE_MASK;	/* within page */
	struct qnx6_long_filename *lf;
	struct address_space *mapping = sbi->longfiles->i_mapping;
	struct page *page = read_mapping_page(mapping, n, NULL);
	if (IS_ERR(page))
		return ERR_CAST(page);
	kmap(*p = page);
	return (struct qnx6_long_filename *)(page_address(page) + offs);
}

and have readdir do
	lf = qnx6_longname(sbi, de, &page);
	if (IS_ERR(lf)) {
		err = PTR_ERR(lf);
		goto out;
	}
	if (filldir(....) < 0) {
		put_page(page);
		err = 0;
		goto out;
	}
	put_page(page);

where put_page() is exactly what we do in e.g. ext2_put_page() - kunmap()
followed by page_cache_release().  And the same sucker would be used by
qnx6_find_entry(), of course.

BTW, after looking at your readdir() - you have a serious bug there.
It bloody ignores ->f_pos - it *always* starts reading from the beginning
of directory.  Try to do ls on really big directory and you'll have
trouble - beginning of directory returned several times.  It will stop
eventually, since you do increment ->f_pos, but...

Another bug there: you leak lf_bh on "too long longname" failure exit...

Looking at namei.c: you know, (void*)((char*)de + QNX6_DIR_ENTRY_SIZE) is a
really weird way to spell de + 1, seeing that actual argument is always
qnx6_dirent *...

I think you are overdesigning things; unlike ext2, your entries are
fixed-sized and that kills most of the complexity in there.  So you
don't need to bother with last_byte() - just do an analog that would
return the index of _entry_ within page; PAGE_CACHE_SIZE / <entry size>
for all pages but the last one and (i_size & ~PAGE_CACHE_MASK) / <entry size>
for the last page.

And just have something like
	for ( ; n < npages; n++, start = 0) {
		int limit = last_entry(inode, n);
		<get and map page>
		de = page_address(page);
		for (i = start, de += start; i < limit; i++, de++) {
			/* handle this entry */
		}
	}
for your loops over directory contents, both in readdir and find_entry.

One more thing: why do you bother passing inodebits separately
to block_map()?  You are guaranteed that
	block_map(..., n, ..., inodebits)
is equal to
	block_map(..., n >> inodebits, ..., 0)
It's really not hard to prove - introduce __bitdelta and __levelsub
and maintain them so that
        levelsub == __levelsub << inodebits
        bitdelta == __bitdelta + inodebits
all the time.  Then observe that (no - levelsub) >> bitdelta is equal to
(no - (__levelsub << inodebits)) >> (__bitdelta + inodebits), i.e. to
((no >> inodebits) - __levelsub) >> __bitdelta.  Now you can lose levelsub -
it's not used anymore.  And after _that_ neither is __bitdelta.  And after
that you can rename __levelsub and __bitdelta to levelsub and bitdelta resp.
and see that you have reproduced the original function with original
uses of inodebits replaced with 0 and all instances of no with no >> inodebits.
The only exception is that debugging printk ("no too large" case) actually
prints what it claims to print ;-)

Allocation of private inodes is easy -
static struct inode *qnx6_private_inode(struct super_block *s,
                                        struct qnx6_root_node *p)
{
        struct inode *inode = new_inode(s);
        if (inode) {
                struct qnx6_inode_info *ei = QNX6_I(inode);
                struct qnx6_sb_info *sbi = QNX6_SB(s);
                inode->i_size = fs64_to_cpu(sbi, p->size);
                memcpy(ei->di_block_ptr, p->ptr, sizeof(p->ptr));
                ei->di_filelevels = p->levels;
                inode->i_mode = S_IFREG | S_IRUSR; /* that will do */
                inode->i_mapping->a_ops = &qnx6_aops;
        }
        return inode;
}
and that's it.  With that (and inodebits killed) you can turn block_map
into qnx6_block_map(inode, block_nr) - superblock will be inode->i_sb...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html