On Sun, 10 Jan 2021, Al Viro wrote: > On Thu, Jan 07, 2021 at 08:15:41AM -0500, Mikulas Patocka wrote: > > Hi > > > > I announce a new version of NVFS - a filesystem for persistent memory. > > http://people.redhat.com/~mpatocka/nvfs/ > Utilities, AFAICS > > > git://leontynka.twibright.com/nvfs.git > Seems to hang on git pull at the moment... Do you have it anywhere else? I saw some errors 'git-daemon: fatal: the remote end hung up unexpectedly' in syslog. I don't know what's causing them. > > I found out that on NVFS, reading a file with the read method has 10% > > better performance than the read_iter method. The benchmark just reads the > > same 4k page over and over again - and the cost of creating and parsing > > the kiocb and iov_iter structures is just that high. > > Apples and oranges... What happens if you take > > ssize_t read_iter_locked(struct file *file, struct iov_iter *to, loff_t *ppos) > { > struct inode *inode = file_inode(file); > struct nvfs_memory_inode *nmi = i_to_nmi(inode); > struct nvfs_superblock *nvs = inode->i_sb->s_fs_info; > ssize_t total = 0; > loff_t pos = *ppos; > int r; > int shift = nvs->log2_page_size; > size_t i_size; > > i_size = inode->i_size; > if (pos >= i_size) > return 0; > iov_iter_truncate(to, i_size - pos); > > while (iov_iter_count(to)) { > void *blk, *ptr; > size_t page_mask = (1UL << shift) - 1; > unsigned page_offset = pos & page_mask; > unsigned prealloc = (iov_iter_count(to) + page_mask) >> shift; > unsigned size; > > blk = nvfs_bmap(nmi, pos >> shift, &prealloc, NULL, NULL, NULL); > if (unlikely(IS_ERR(blk))) { > r = PTR_ERR(blk); > goto ret_r; > } > size = ((size_t)prealloc << shift) - page_offset; > ptr = blk + page_offset; > if (unlikely(!blk)) { > size = min(size, (unsigned)PAGE_SIZE); > ptr = empty_zero_page; > } > size = copy_to_iter(to, ptr, size); > if (unlikely(!size)) { > r = -EFAULT; > goto ret_r; > } > > pos += size; > total += size; > } while (iov_iter_count(to)); > > r = 0; > > ret_r: > *ppos = pos; > > if (file) > file_accessed(file); > > return total ? total : r; > } > > and use that instead of your nvfs_rw_iter_locked() in your > ->read_iter() for DAX read case? Then the same with > s/copy_to_iter/_copy_to_iter/, to see how much of that is > "hardening" overhead. > > Incidentally, what's the point of sharing nvfs_rw_iter() for > read and write cases? They have practically no overlap - > count the lines common for wr and !wr cases. And if you > do the same in nvfs_rw_iter_locked(), you'll see that the > shared parts _there_ are bloody pointless on the read side. That's a good point. I split nvfs_rw_iter to separate functions nvfs_read_iter and nvfs_write_iter - and inlined nvfs_rw_iter_locked into both of them. It improved performance by 1.3%. > Not that it had been more useful on the write side, really, > but that's another story (nvfs_write_pages() handling of > copyin is... interesting). Let's figure out what's going > on with the read overhead first... > > lib/iov_iter.c primitives certainly could use massage for > better code generation, but let's find out how much of the > PITA is due to those and how much comes from you fighing > the damn thing instead of using it sanely... The results are: read: 6.744s read_iter: 7.417s read_iter - separate read and write path: 7.321s Al's read_iter: 7.182s Al's read_iter with _copy_to_iter: 7.181s Mikulas