pnfs LD partial sector write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Boaz,

Sorry about the long delay. I had some internal interrupt. Now I'm
looking at the partial LD write problem again. Instead of trying to
bail out unaligned writes blindly, this time I want to fix the write
code to handle partial write as you suggested before. However, it
seems to be more problematic than I used to think.

The dirty range of a page passed to LD->write_pagelist may be
unaligned to sector size, in which case block layer cannot handle it
correctly. Even worse, I cannot do a read-modify-write cycle within
the same page because bio would read in the entire sector and thus
ruin user data within the same sector. Currently I'm thinking of
creating shadow pages for partial sector write and use them to read in
the sector and copy necessary data into user pages. But it is way too
tricky and I don't feel like it at all. So I want to ask how you solve
the partial sector write problem in object layout driver.

I looked at the ore code and found that you are using bio to deal with
partial page read/write as well. But in places like _add_to_r4w(), I
don't see how partial sectors are handled. Maybe I was misreading the
code. Would you please shed some light? More specifically, how does
object layout driver handle partial sector writers like in bellow
simple testcase? Thanks in advance.

-- 
Best,
Tao


flock-partial-write.c:

#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>

int main(char argc, char **argv)
{
	int fd, i, offset = 666, len = 777;
	char buf[4096], buf_v[4096];
	struct flock lock;

	if (argc != 2) {
		fprintf(stderr, "Usage: %s [filename]\n", argv[0]);
		return -1;
	}

	memset(buf, 'A', sizeof(buf));

	if ((fd = open(argv[1], O_CREAT|O_RDWR, 0644)) < 0) {
		perror("open fail");
		return -1;
	}

	if (write(fd, buf, sizeof(buf)) < sizeof(buf)) {
		perror("write fail");
		return -1;
	}

	close(fd);

	system("echo 1 > /proc/sys/vm/drop_caches");

	memset(buf + offset, 'B', len);
	memcpy(buf_v, buf, sizeof(buf_v));

	if ((fd = open(argv[1], O_WRONLY)) < 0) {
		perror("open fail");
		return -1;
	}

	lock.l_type = F_WRLCK;
	lock.l_whence = SEEK_SET;
	lock.l_start = offset;
	lock.l_len = len;

	if (fcntl(fd, F_SETLKW, &lock) < 0) {
		perror("lock fail");
		return -1;
	}

	if (lseek(fd, offset, SEEK_SET) < 0) {
		perror("seek fail");
		return -1;
	}

	if (write(fd, buf + offset, len) < len) {
		perror("write fail");
		return -1;
	}

	lock.l_type = F_UNLCK;
	fcntl(fd, F_SETLK, &lock);

	close(fd);

	if ((fd = open(argv[1], O_RDONLY)) < 0) {
		perror("open fail");
		return -1;
	}

	if (read(fd, buf, sizeof(buf)) < sizeof(buf)) {
		perror("read fail");
		return -1;
	}

	if (memcmp(buf, buf_v, sizeof(buf)) != 0) {
		fprintf(stderr, "aha, buf not match\n");
		for (i = 0; i < sizeof(buf); i++) {
			if (buf[i] != buf_v[i])
				fprintf(stderr, "%dth %c vs %c\n", i, buf[i], buf_v[i]);
		}
	} else {
		printf("nice done!\n");
	}

	close(fd);
	return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux