On Wed, Feb 15, 2012 at 03:52:01AM +0100, Kai Bankett wrote: > After successfully switching find_enty() to pagemap I spend quite > some time on switching long_match() to pagemap cache. > At the end I gave up. Things just became too complicated. Not if you have a separate struct inode for /.longfile - then it's simply read_cache_page() + kmap(). > Far more complicated than bh stuff. > At least if I get no further clue on how to manage the different > pagesizes efficient - compared to the very efficient bh code - I > cannot see light at the end of the tunnel. > For long Filenames, the Filename always is stored in a seperate > block. Whatever I tried, I did not get the pagemap code as > efficient, as it abstracts from blocksize. (that's where it got it's > stength - IHMO) > Ok, I could save the block adressing steps, but I will add so many > other steps on the way ... I am really unsure if that pays off at > the end. At least code complexity rises extremely. You do realize that buffer cache does _not_ exist separately from pagecache, right? It's just the pagecache of underlying block device; see __find_get_block_slow() for what really happens behind the scene. The difference between page and buffer cache is not in abstracting the size away; that's trivial. The real difference is that one is that between virtually-indexed and physically-indexed caches. Essentially, qnx6_get_block() is address translation for that sucker. And buffer_head is just a TLB entry. Think of struct address_space as MMU context; we have one for identity mapping of block device and what you are doing is manually asking for address translation and feeding the resulting physical address into that identity mapped context. You could use the address spaces of directory and /.longfiles resp. instead and let the normal logics to take care of things. Moreover, you get address translation cached that way - no need to recalculate physical block addresses every time. That map_bh() in qnx6_get_block() is there exactly for that reason... Block size is not an issue, really - readdir() should just map pages one by one and loop over entries in those; when it hits longname, it would just do read_cache_page() on longfiles inode and look into it; that's all. I.e. you need something like struct qnx6_long_filename *qnx6_longname(sbi, de, p) { u32 s = fs32_to_cpu(sbi, de->de_long_inode); /* in 512-byte units */ u32 n = s >> (PAGE_CACHE_SHIFT - 9); /* in pages */ u32 offs = (s << 9) & ~PAGE_CACHE_MASK; /* within page */ struct qnx6_long_filename *lf; struct address_space *mapping = sbi->longfiles->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); if (IS_ERR(page)) return ERR_CAST(page); kmap(*p = page); return (struct qnx6_long_filename *)(page_address(page) + offs); } and have readdir do lf = qnx6_longname(sbi, de, &page); if (IS_ERR(lf)) { err = PTR_ERR(lf); goto out; } if (filldir(....) < 0) { put_page(page); err = 0; goto out; } put_page(page); where put_page() is exactly what we do in e.g. ext2_put_page() - kunmap() followed by page_cache_release(). And the same sucker would be used by qnx6_find_entry(), of course. BTW, after looking at your readdir() - you have a serious bug there. It bloody ignores ->f_pos - it *always* starts reading from the beginning of directory. Try to do ls on really big directory and you'll have trouble - beginning of directory returned several times. It will stop eventually, since you do increment ->f_pos, but... Another bug there: you leak lf_bh on "too long longname" failure exit... Looking at namei.c: you know, (void*)((char*)de + QNX6_DIR_ENTRY_SIZE) is a really weird way to spell de + 1, seeing that actual argument is always qnx6_dirent *... I think you are overdesigning things; unlike ext2, your entries are fixed-sized and that kills most of the complexity in there. So you don't need to bother with last_byte() - just do an analog that would return the index of _entry_ within page; PAGE_CACHE_SIZE / <entry size> for all pages but the last one and (i_size & ~PAGE_CACHE_MASK) / <entry size> for the last page. And just have something like for ( ; n < npages; n++, start = 0) { int limit = last_entry(inode, n); <get and map page> de = page_address(page); for (i = start, de += start; i < limit; i++, de++) { /* handle this entry */ } } for your loops over directory contents, both in readdir and find_entry. One more thing: why do you bother passing inodebits separately to block_map()? You are guaranteed that block_map(..., n, ..., inodebits) is equal to block_map(..., n >> inodebits, ..., 0) It's really not hard to prove - introduce __bitdelta and __levelsub and maintain them so that levelsub == __levelsub << inodebits bitdelta == __bitdelta + inodebits all the time. Then observe that (no - levelsub) >> bitdelta is equal to (no - (__levelsub << inodebits)) >> (__bitdelta + inodebits), i.e. to ((no >> inodebits) - __levelsub) >> __bitdelta. Now you can lose levelsub - it's not used anymore. And after _that_ neither is __bitdelta. And after that you can rename __levelsub and __bitdelta to levelsub and bitdelta resp. and see that you have reproduced the original function with original uses of inodebits replaced with 0 and all instances of no with no >> inodebits. The only exception is that debugging printk ("no too large" case) actually prints what it claims to print ;-) Allocation of private inodes is easy - static struct inode *qnx6_private_inode(struct super_block *s, struct qnx6_root_node *p) { struct inode *inode = new_inode(s); if (inode) { struct qnx6_inode_info *ei = QNX6_I(inode); struct qnx6_sb_info *sbi = QNX6_SB(s); inode->i_size = fs64_to_cpu(sbi, p->size); memcpy(ei->di_block_ptr, p->ptr, sizeof(p->ptr)); ei->di_filelevels = p->levels; inode->i_mode = S_IFREG | S_IRUSR; /* that will do */ inode->i_mapping->a_ops = &qnx6_aops; } return inode; } and that's it. With that (and inodebits killed) you can turn block_map into qnx6_block_map(inode, block_nr) - superblock will be inode->i_sb... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html