Re: Does NFS4 need st_gen?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/21/2011 01:10 PM, Trond Myklebust wrote:
> On Fri, 2011-10-21 at 12:09 -0400, Nikolaus Rath wrote: 
>> On 10/21/2011 12:00 PM, Trond Myklebust wrote:
>>> On Fri, 2011-10-21 at 09:54 -0400, Nikolaus Rath wrote: 
>>>> Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> writes:
>>>>> On Thu, 2011-10-20 at 16:37 -0400, Nikolaus Rath wrote: 
>>>>>> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes:
>>>>>>> On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
>>>>>>>> I'm working on a FUSE file system that stores file system metadata in an
>>>>>>>> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
>>>>>>>> of inode generation numbers would keep the code much simpler, because I
>>>>>>>> want to delete inode-rows from the SQL table when the last reference to
>>>>>>>> the inode is deleted (so I can't keep track of the generation no).
>>>>>>>
>>>>>>> You can use current time, or a counter, or something, as the generation
>>>>>>> number.
>>>>>>
>>>>>> With current time I'm screwed if the system clock doesn't have
>>>>>> sufficiently fine granularity. With a counter, I either have to remember
>>>>>> counter values per-inode even after the inode is deleted, or the global
>>>>>> counter will overflow at some point (in which case I may just as well
>>>>>> require unique inodes in the first place).
>>>>>
>>>>> The filehandle is between 32 (NFSv2) and 128(NFSv4) bytes long. How long
>>>>> do you expect it to take you to create+destroy between 2^256 and 2^1024
>>>>> inodes? I'm guessing that we'll all be long dead and the universe will
>>>>> have undergone heat death before that happens...
>>>>
>>>> Please stop assuming that I'm stupid or haven't thought about the
>>>> problem at all. The bottleneck is not the length of the NFS file handle,
>>>> but the length of the inode and generation number (both of which are
>>>> restricted to 32bit by FUSE) together with the requirement that not only
>>>> both of them together need to be unique forever, but the inode also
>>>> needs to be unique at any given instant (so they cannot be trivially
>>>> combined to form a 64bit value).
>>>
>>> No. The point is you don't need a generation number if you don't want to
>>> implement one...
>>>
>>> You can use any unique identifier + the inode number, and the unique
>>> identifier is only limited by the size of the filehandle.
>>
>> So how do you choose the unique identifier? It's limited by FUSE to
>> 32bit and therefore can't be a global counter, it can't be a timestamp
> 
> AFAICS fuse gives you a 64-bit inode number and a 32-bit generation
> counter. 

Yes, with 64bit inodes everything would be fine. But fuse uses 'long'
for inodes, so on 32bit systems you only have 32bit inodes even if ino_t
is 64bit.


> IOW: start allocating inode numbers incrementally from 0 - 2^64, then
> each time you overflow the 64-bit inode number counter, bump the
> generation number. You'll have to skip those inode numbers that are
> already allocated in the subsequent generations, but the total number of
> unique combinations is still likely to be more than large enough not to
> be a worry.

Yes, as I said eariler, it is possible to do with the available 32 + 32
bits, but it does introduce additional complexity.


>> because the system clock may not have enough resolution, and it can't be
>> a per-inode counter because then I can't discard the counter after the
>> inode has been deleted.
> 
> If you need more unique values, then modify fuse to allow your
> filesystem to manage the exportfs interface. The fuse ABI is versioned,
> and can be extended to support new features.

FUSE 3 will have 64bit inodes, and I don't think this feature would make
it into 2.x.


Best,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux