Dmitry Monakhov <dmonakhov@xxxxxxxxxx> writes: > Jan Kara <jack@xxxxxxx> writes: > >>> Eric Sandeen <sandeen@xxxxxxxxxx> writes: >>> >>> > Dmitry Monakhov wrote: >>> >> Eric Sandeen <sandeen@xxxxxxxxxx> writes: >>> >> >>> >>> Because we can badly over-reserve metadata when we >>> >>> calculate worst-case, it complicates things for quota, since >>> >>> we must reserve and then claim later, retry on EDQUOT, etc. >>> >>> Quota is also a generally smaller pool than fs free blocks, >>> >>> so this over-reservation hurts more, and more often. >>> >>> >>> >>> I'm of the opinion that it's not the worst thing to allow >>> >>> metadata to push a user slightly over quota. This simplifies >>> >>> the code and avoids the false quota rejections that result >>> >>> from worst-case speculation. >>> >> Hm.. Totally agree with issue description. And seem there is no another >>> >> solution except yours. >>> >> ASAIU alloc_nofail is called from places where it is impossible to fail >>> >> an allocation even if something goes wrong. >>> >> I ask because currently i'm working on EIO handling in alloc/free calls. >>> >> I've found that it is useless to fail claim/free procedures because >>> >> caller is unable to handle it properly. >>> >> It is impossible to fail following operation >>> >> ->writepage >>> >> ->dquot_claim_space (what to do if EIO happens?) >>> > >>> > Hm, if these start returning EIO then maybe my patch should be modified >>> > to treat EDQUOT differently than EIO ... assuming callers can handle >>> > the return at all. >>> > >>> > In other words, make NOFAIL really just mean "don't fail for EDQUOT" >>> Yes. agree So we have two types of errors >>> 1) expected errors: EDQUOT >>> 2) fatal errors: (EIO/ENOSPC/ENOMEM) >>> So we need two types of flags: >>> 1)FORCE (IMHO it is better name than you proposed) to allow exceed a >>> quota limit >>> 2)NOFAIL to allow ignore fatal errors. >>> >>> We still need NOFAIL, because for example if something is happens in >>> ->write_page() >>> ->dquot_claim() >>> update_quota() -> EIO /* update disk quota */ >>> update_bytes() /* update i_bytes count */ >>> It is obvious that write_page should fail because it is too late to >>> return the error to userspace, so data will probably lost which >>> is much more dramatic bug than quota inconsistency. >>> So the only options we have is to: >>> 1) Do not modify inode->i_bytes and return error which caller will >>> probably ignore. IMHO this is not good because result in >>> incorrect stat() >>> >>> 2) do as much as we can (as it happens for now), modify inode->i_bytes >>> and return positive error code to caller.(which signal what error >>> result in quota inconsystency only) >> Yes, agreed that 2) is a better solution. >> >>> This fatal errors handling logic i'll post on top of your patch-set. >>> But please change flag name from NOFAIL to FORCE. >> Hmm, do we really need to distinguish between your NOFAIL and FORCE? >> I mean there are places where we can handle quota failures (both EDQUOT >> or others) and places where we cannot and then we just want to go on as >> seamlessly as possible. So NOFAIL flag seems to be enough... >> Now I agree that in theory there can be some caller which might wish >> to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware >> of such callsite currently so there's no immediate need for the flag. >> So Eric's patches seem to be fine to me as they are. What do you think? > FORCE but without NOFAIL will be useful for example in > write to fallocated area > extent split/convert (uninitialized=>initialized conversion) > It is not good idea to return EDQUOT from write to reserved area > due to metadata overhead, but it is easy to handle EIO from that method. > So IMHO two flags is not a fancy option, but reasonable design solution. > This design confirms to right for openvz's userbean counters. > The only thing i ask is to rename NOFAIL => FORCE. > > BTW I'm too familiar with cross-devel-tree process OOps i meant to say i'm not familiar :) > If tytso@ will get the patchset will you get an quota-related patches > to linux-fs tree too? Otherwise everybody have to wait for ext4-tree > push to linus's tree and when to linux-fs. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html