On Tue, Apr 16, 2019 at 3:48 AM Luis Henriques <lhenriques@xxxxxxxx> wrote: > > Dave Chinner <david@xxxxxxxxxxxxx> writes: > > > On Mon, Apr 15, 2019 at 10:16:18AM +0800, Yan, Zheng wrote: > >> On 4/15/19 6:15 AM, Dave Chinner wrote: > >> > On Fri, Apr 12, 2019 at 11:37:55AM +0800, Yan, Zheng wrote: > >> > > On 4/12/19 9:15 AM, Dave Chinner wrote: > >> > > > On Thu, Apr 04, 2019 at 11:18:22AM +0100, Luis Henriques wrote: > >> > > > > Dave Chinner <david@xxxxxxxxxxxxx> writes: > >> > > For DSYNC write, client has already written data to object store. If client > >> > > crashes, MDS will set file to 'recovering' state and probe file size by > >> > > checking object store. Accessing the file is blocked during recovery. > >> > > >> > IOWs, ceph allows data integrity writes to the object store even > >> > though those writes breach limits on that object store? i.e. > >> > ceph quota essentially ignores O_SYNC/O_DSYNC metadata requirements? > >> > > >> > >> Current cephfs quota implementation checks quota (compare i_size and quota > >> setting) at very beginning of ceph_write_iter(). Nothing do with O_SYNC and > >> O_DSYNC. > > > > Hold on, if the quota is checked on the client at the start of every > > write, then why is it not enforced /exactly/? Where does this "we > > didn't notice we'd run out of quota" overrun come from then? > > Ok, there's an extra piece of information that I don't think it was > mentioned in the thread yet, which is the (recursive) directory > statistics. These stats are maintained by the MDS and it's against > these stats that the client actually checks if there's an overrun. The > checks can be done against the amount of bytes ('max_bytes' quota) > and/or number of files ('max_files' quota), depending on which quotas > are enabled. > > _Maybe_ there's space for some optimization on the client-side by > locally updating the stats received from the MDS with local writes, > files creation/deletion, etc. But I'm not sure it's worth the effort, > because: > > - for each operation (write, truncate, create, ...) we would also have > the extra overhead of walking up the directory tree (or at least the > quota realms tree) to update the stats for each directory; and > > - we may have more clients writing to the same directories, and thus > messing up with the dir quotas anyway. I.e. the quotas overrun > problem would still be there, and we would still require to open/close > files in the test. > > Hopefully this helps clarifying why the open/close loop is a needed hack > with the current quotas design. More broadly, and without reference to any specific design, Ceph is a distributed storage system that allows multiple active clients to independently do activity against multiple servers, and which allows directory quotas. Strict enforcement of quotas requires a single centralized authority to know the data within each directory and decide on every individual write operation if there is enough space to allow it; this is easy on a local filesystem but fundamentally cripples a multi-computer one since it implies serializing every IO on a single server. (Yes, there are LOTS of tricks you can pull so that "normally" it's not a problem and only impacts IO when you are actually approaching the quota limit; yes, there are ways to try and proactively reserve quota for a given writer; they all break down into the "contact a single server and serialize on it" when you actually approach the quota and nobody has yet expressed any interest in Ceph's quotas being that precise anyway.) > > Cheers, > -- > Luis > > > > > i.e. the test changes are implying that quota is not accurately > > checked and enforced on every write, and that there is something > > less that exact about quotas on the ceph client. Yet you say they > > are checked on every write. > > > > Where does the need to open/close files and force flushing client > > state to the MDS come from if quota is actually being checked > > on every write as you say it is? > > > > i.e. I'm trying to work out if this change is just working around > > bugs in ceph quota accounting and I'm being told conflicting things > > about how the ceph client accounts and enforces quota limits. Can > > you please clearly explain how the quota enforcedment works and why > > close/open between writes is necessary for accurate quota > > enforcement so that we have some clue as to why these rubbery limit > > hacks are necessary? > > > > If we don't understand why a test does something and it's not > > adequately documented, we can't really be expected to maintain > > it in working order.... > > > > Cheers, > > > > Dave.