Sorry for late to reply.I reviewed the code again and found some probleam. I created a soft-raid and the size was larger than 16T. The os is ubuntu 12.04 32bit x86. The udev create the block node is /dev dir(as tmpfs). And I readed the tmpfs code : in mm/shmem.c:shmem_fill_super() >sb->s_maxbytes = MAX_LFS_FILESIZE; In my computer, MAX_LFS_FILESZE is equal 8T -1. But the read code: generic_file_aio_read-->do_generic_file_read[not use direct flag In function:do_generic_file_read(): >index = *ppos >> PAGE_CACHE_SHIFT; index is the type of pgoff_t. So if *ppos is larger than 16T, the index is overflow.As you said, it will read low position data. But I tested the write operation: blkdev_aio_write->__generic_file_aio_write. In function:__generic_file_aio_write() It will check by function:generic_write_checks() But In function >if (likely(!isblk)) { > if (unlikely(*pos >= inode->i_sb->s_maxbytes)) { > if (*count || *pos > inode->i_sb->s_maxbytes) { > return -EFBIG; > } > /* zero-length writes at ->s_maxbytes are OK */ > } > if (unlikely(*pos + *count > inode->i_sb->s_maxbytes)) > *count = inode->i_sb->s_maxbytes - *pos; > } else { >#ifdef CONFIG_BLOCK > loff_t isize; > if (bdev_read_only(I_BDEV(inode))) > return -EPERM; > isize = i_size_read(inode); > if (*pos >= isize) { > if (*count || *pos > isize) > return -ENOSPC; > } > if (*pos + *count > isize) > *count = isize - *pos; >#else > return -EPERM; >#endif Although it check (s_maxbytes)MAX_LFS_FILESIZE.But is file is block device,it did not check,it only check the real size. But there is also a bug.Because if block size > 16T,there was not error and execed continue. When exec generic_file_buffered_write()[no odriect action] --->generic_perform_write-->write_begin[blkdev_write_begin] --->block_write_begin In function:block_write_begin() >pgoff_t index = pos >> PAGE_CACHE_SHIFT; index will overflow. I once thought to patch those bug(I may be well-known ,haha).But I can't,as is generic_write_checks(): >/* > * Are we about to exceed the fs block limit ? > * > * If we have written data it becomes a short write. If we have > * exceeded without writing data we send a signal and return EFBIG. > * Linus frestrict idea will clean these up nicely.. > */ > if (likely(!isblk)) { how to deal with block? As a regular file or not? ------------------ majianpeng 2012-05-28 ------------------------------------------------------------- 发件人:Hugh Dickins 发送日期:2012-05-27 05:24:13 收件人:majianpeng 抄送:Al Viro; Andrew Morton; linux-mm; linux-fsdevel 主题:Re: the max size of block device on 32bit os,when usingdo_generic_file_read() proceed. On Thu, 24 May 2012, majianpeng wrote: > Hi all: > I readed a raid5,which size 30T.OS is RHEL6 32bit. > I reaed the raid5(as a whole,not parted) and found read address which not i wanted. > So I tested the newest kernel code,the problem is still. > I review the code, in function do_generic_file_read() > > index = *ppos >> PAGE_CACHE_SHIFT; > index is u32.and *ppos is long long. > So when *ppos is larger than 0xFFFF FFFF * PAGE_CACHE_SHIFT(16T Byte),then the index is error. > > I wonder this .In 32bit os ,block devices size do not large then 16T,in other words, if block devices larger than 16T,must parted. I am not surprised that the page cache limitation prevents you from reading the whole device with a 32-bit kernel. See MAX_LFS_FILESIZE in include/linux/fs.h. Our answer to that is just to use a 64-bit kernel. #if BITS_PER_LONG==32 #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) #elif BITS_PER_LONG==64 #define MAX_LFS_FILESIZE 0x7fffffffffffffffUL #endif But I am a little surprised that you get as far as 16TiB (with 4k page): I would have expected you to be stopped just before 8TiB (although I suspect that the limitation to 8TiB rather than 16TiB is unnecessary). And if I understand you correctly, read() or pread() gave you no error at those large offsets, but supplied data from the low offset instead? That does surprise me - have we missed a check there? Hugh ?韬{.n?????%??檩??w?{.n???{饼?z鳐??骅w*jg????????G??⒏⒎?:+v????????????"??????