Bernd Schubert wrote on 5/6/22 8:45 PM:
On 5/6/22 19:07, Vivek Goyal wrote:On Fri, May 06, 2022 at 06:41:17PM +0200, Bernd Schubert wrote:On 5/6/22 16:12, Vivek Goyal wrote: [...]On Fri, May 06, 2022 at 11:04:05AM +0530, Dharmendra Hans wrote:Ok, looks like your fuse file server is talking to a another file server on network and that's why you are mentioning two network trips. Let us differentiate between two things first. A. FUSE protocol semantics B. Implementation of FUSE protocl by libfuse. I think I am stressing on A and you are stressing on B. I just want to see what's the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE from fuse protocol point of view. Again look at from kernel's point of view and don't worry about libfuse is going to implement it. Implementations can vary.Agreed, I don't think we need to bring in network for the kernel to libfuseAPI.From kernel's perspective FUSE_CREATE is supposed to create + open afile. It is possible file already exists. Look at include/fuse_lowlevel.hdescription for create(). /** * Create and open a file ** If the file does not exist, first create it with the specified* mode, and then open it. */ I notice that fuse is offering a high level API as well as low level API. I primarily know about low level API. To me these are just two different implementation but things don't change how kernel sends fuse messages and what it expects from server in return. Now with FUSE_ATOMIC_CREATE, from kernel's perspective, only difference is that in reply message file server will also indicate if file was actually created or not. Is that right? And I am focussing on this FUSE API apsect. I am least concerned atthis point of time who libfuse decides to actually implement FUSE_CREATEor FUSE_ATOMIC_CREATE etc. You might make a single call in libfuse server (instead of two) and that's performance optimization in libfuse. Kernel does not care how many calls did you make in file server to implement FUSE_CREATE or FUSE_ATOMIC_CREATE. All it cares is that create and open the file. So while you might do things in more atomic manner in file server andcut down on network traffic, kernel fuse API does not care. All it caresabout is create + open a file. Anyway, from kernel's perspective, I think you should be able to just use FUSE_CREATE and still be do "lookup + create + open". FUSE_ATOMIC_CREATE is just allows one additional optimization so that you know whether to invalidate parent dir's attrs or not. In fact kernel is not putting any atomicity requirements as well on file server. And that's why I think this new command should probably be called FUSE_CREATE_EXT because it just sends back additional info. All the atomicity stuff you have been describing is that you are trying to do some optimizations in libfuse implementation to implement FUSE_ATOMIC_CREATE so that you send less number of commands over network. That's a good idea but fuse kernel API does not require you do these atomically, AFAICS. Given I know little bit of fuse low level API, If I were to implement this in virtiofs/passthrough_ll.c, I probably will do following. A. Check if caller provided O_EXCL flag. B. openat(O_CREAT | O_EXCL) C. If success, we created the file. Set file_created = 1. D. If error and error != -EEXIST, send error back to client. E. If error and error == -EEXIST, if caller did provide O_EXCL flag, return error. F. openat() returned -EEXIST and caller did not provide O_EXCL flag, that means file already exists. Set file_created = 0. G. Do lookup() etc to create internal lo_inode and stat() of file. H. Send response back to client using fuse_reply_create(). This is one sample implementation for fuse lowlevel API. There could be other ways to implement. But all that is libfuse + filesystem specific and kernel does not care how many operations you use to complete and what's the atomicity etc. Of course less number of operations you do better it is. Anyway, I think I have said enough on this topic. IMHO, FUSE_CREATEdescritpion (fuse_lowlevel.h) already mentions that "If the file does notexist, first create it with the specified mode and then open it". That means intent of protocol is that file could already be there as well. So I think we probably should implement this optimization (in kernel)using FUSE_CREATE command and then add FUSE_CREATE_EXT to add optimizationabout knowing whether file was actually created or not.W.r.t libfuse optimizations, I am not sure why can't you do optimizationswith FUSE_CREATE and why do you need FUSE_CREATE_EXT necessarily. If are you worried that some existing filesystems will break, I think you can create an internal helper say fuse_create_atomic() and then use that if filesystem offers it. IOW, libfuse will have two ways to implement FUSE_CREATE. And if filesystem offers a new way which cuts down on network traffic, libfuse uses more efficient method. We should not have to change kernel FUSE API just because libfuse can do create + open operation more efficiently.Ah right, I like this. As I had written before, the first patch version was using FUSE_CREATE and I was worried to break something. Yes, it should bepossible split into lookup+create on the libfuse side. That being said,libfuse will need to know which version it is - there might be an old kernelsending the non-optimized version - libfuse should not do another lookup then.I am confused about one thing. For FUSE_CREATE command, how does it matter whether kernel has done lookup() before sending FUSE_CREATE. All FUSE_CREATE seems to say that crate a file (if it does not exist already) and then open it and return file handle as well as inode attributes. It does not say anything about whether a LOOKUP has already been done by kernel or not. It looks like you are assuming that if FUSE_CREATE is coming, that means client has already done FUSE_LOOKUP. So there is something we are not on same page about. I looked at fuse_lowlevel API and passthrough_ll.c and there is no assumption whether FUSE_LOOKUP has already been called by client before calling FUSE_CREATE. Similarly, I looked at virtiofs code and I can't see any such assumption there as well.The current linux kernel code does this right now, skipping the lookup just changes behavior. Personally I would see it as bug if the userspace implementation does not handle EEXIST for FUSE_CREATE. Implementation developer and especially users might see it differently if a kernel update breaks/changes things out of the sudden. passthrough_ll.c is not the issue here, it handles it correctly, but what about the XYZ other file systems out there - do you want to check them one by one, including closed source ones? And wouldn't even a single broken application count as regression?https://github.com/qemu/qemu/blob/master/tools/virtiofsd/passthrough_ll.c So I am sort of lost. May be you can help me understsand this.I guess it would be more interesting to look at different file systems that are not overlay based. Like ntfs-3g - I have not looked at the code yet, but especially disk based file system didn't have a reason so far to handle EEXIST. And
AFAIK ntfs-3g proper does not keep a context across calls and does not know what LOOKUP was preparing a CREATE. However this may have consequences in the high level of libfuse for managing the file tree. The kernel apparently issues a LOOKUP to decide whether issuing a CREATE (create+open) or an OPEN. If it sent blindly a CREATE, ntfs-3g would return EEXIST if the name was already present in the directory. For a test, can you suggest a way to force ignore of such lookup within libfuse, without applying your kernel patches ? Is there a way to detect the purpose of a lookup ? (A possible way is to hardcode a directory inode within which the lookups return ENOENT).
Now there is 'fi.flags = arg->flags', but these are already taken byopen/fcntl flags - I would not feel comfortable to overload these. At best, struct fuse_create_in currently had a padding field, we could convert these to something like 'ext_fuse_open_flags' and then use it for fuse internal things. Difficulty here is that I don't know if all kernel implementationszero the struct (BSD, MacOS), so I guess we would need to negotiate atstartup/init time and would need another main feature flag? And with thatI'm not be sure anymore if the result would be actually more simple than what we have right now for the first patch.If FUSE_CREATE indeed has a dependency on FUSE_LOOKUP have been called before that, then I agree that we can't implement new semantics with FUSE_CREATE and we will have to introduce a new op say FUSE_ATOMIC_CREATE/FUSE_LOOKUP_CREATE/FUSE_CREATE_EXT. But looking at fuse API, I don't see FUSE_CREATE ever guaranteeing that a FUSE_LOOKUP has been done before this.It is not in written document, but in the existing (linux) kernel behavior.You can look up "regressions due to 64-bit ext4 directory cookies" - patches from me had been once already the reason for accidental breakage and back that time I did not even have the slightest chance to predict it, as glusterfs was relying on max 32bit telldir results and using the other 32bit for its own purposes. Kernel behavior changed and application broke, even though the application was relying on non-posix behavior. In case of fuse there is no document like posix, but just API headers and current behavior...Thanks, Bernd