Re: [patch 01/22] update ctime and mtime for mmaped write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> >> While these entry points do not actually modify the file itself,
> >> as was pointed out, they are handy points at which the kernel gains
> >> control and could actually notice that the contents of the file are
> >> no longer the same as they were, ie. modified.
> >>
> >>  From the operating system viewpoint, this is where the semantics of
> >> modification to file contents via mmap differs from the semantics of
> >> modification to file contents via write(2).
> >>
> >> It is desirable for the file times to be updated as quickly as
> >> possible after the actual modification has occurred.
> >>     
> >
> > I disagree.
> >
> > You don't worry about the timestamp being updated _during_ a large
> > write() call, even though the file is constantly being modified.
> >
> >   
> 
> No, but you do worry about the timestamps being updated after
> every write() call, no matter how large or small.

Right.  All I'm saying is that just writing to a shared mapping
without calling msync() is similar to a write() which hasn't yet
finished.  In both cases, you have a modified file, without a modified
timestamp.

> > You think of write() as something instantaneous, while you think of
> > writing to a shared mapping, then doing msync() as something taking a
> > long time.  In actual fact both of these are basically equivalent
> > operations, the differences being, that you can easily modify
> > non-contiguous parts of a file with mmap, while you can't do that with
> > write.  The disadvantage from mmap comes from the cost of setting up
> > the page tables and handling the faults.
> >
> > Think of it this way:
> >
> >   shared mmap write + msync(MS_ASYNC)  ==  write()
> >   msync(MS_ASYNC) + fsync()  ==  msync(MS_SYNC)
> >
> >   
> 
> I don't believe that this is a valid characterization because the
> changes to the contents of the file, made through the mmap'd region,
> are immediately visible to any and all other applications accessing
> the file.  Since the contents of the file are changing, then so
> should the timestamps to reflect this.

Same case with a large write().  Nothing prevents you from reading a
file, while a huge write is taking place to it, and yet, the
modification time isn't updated.

> I think that we are going to have to agree to disagree because
> I don't agree either with your characterizations of the desirable
> semantics associated with shared mmap or that maintaining the
> correctness in the system is a waste of CPU.

I didn't quite say _that_ in so many words :).  I said that updating
the timestamp on a per-page first dirtying base, or per-inode first
dirtying base is a waste of effort.  Why?

What happens if the application overwrites what it had written some
time later?  Nothing.  The page is already read-write, the pte dirty,
so even though the file was clearly modified, there's absolutely no
way in which this can be used to force an update to the timestamp.

Is there anything special about the _first_ modification?  I don't
think so.  From an external application's point of view it doesn't
matter one whit, whether a modification was through write() or after a
page-fault, or on an already present read-write page.

So what exactly _are_ the semantics we are trying to achieve?

> I view mmap as a way for an application to treat the contents of a
> file as another segment in its address space.  This allows it to
> manipulate the contents of a file without incurring the overhead of
> the read and write system calls and the double buffering that
> naturally occurs with those system calls.  I think that:
> 
>     char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>     *p = 1;
>     *(p + 4096) = 2;
> 
> should have the same effect as:
> 
>     char c = 1;
>     pwrite(fd, &c, 1, 0);
>     c = 2;
>     pwrite(fd, &c, 1, 4096);

Not necessarily.  This is the equivalent _portable_ call sequence:

     char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
     *p = 1;
     *(p + 4096) = 2;
     msync(p, 4097, MS_ASYNC);

Yes, on linux the prior would work too, but there's really no point in
allowing applications to be lax and not do it properly.  But we've
been over this.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux