On Wednesday, March 6, 2013 at 11:07 AM, Jim Schutt wrote: > On 03/05/2013 12:33 PM, Sage Weil wrote: > > > > Running 'du' on each directory would be much faster with Ceph since it > > > > accounts tracks the subdirectories and shows their total size with an 'ls > > > > -al'. > > > > > > > > Environments with 100k users also tend to be very dynamic with adding and > > > > removing users all the time, so creating separate filesystems for them would > > > > be very time consuming. > > > > > > > > Now, I'm not talking about enforcing soft or hard quotas, I'm just talking > > > > about knowing how much space uid X and Y consume on the filesystem. > > > > > > > > > The part I'm most unclear on is what use cases people have where uid X and > > Y are spread around the file system (not in a single or small set of sub > > directories) and per-user (not, say, per-project) quotas are still > > necessary. In most environments, users get their own home directory and > > everything lives there... > > > > Hmmm, is there a tool I should be using that will return the space > used by a directory, and all its descendants? > > If it's 'du', that tool is definitely not fast for me. > > I'm doing an 'strace du -s <path>', where <path> has one > subdirectory which contains ~600 files. I've got ~200 clients > mounting the file system, and each client wrote 3 files in that > directory. > > I'm doing the 'du' from one of those nodes, and the strace is showing > me du is doing a 'newfstat' for each file. For each file that was > written on a different client from where du is running, that 'newfstat' > takes tens of seconds to return. Which means my 'du' has been running > for quite some time and hasn't finished yet.... > > I'm hoping there's another tool I'm supposed to be using that I > don't know about yet. Our use case includes tens of millions > of files written from thousands of clients, and whatever tool > we use to do space accounting needs to not walk an entire directory > tree, checking each file. Check out the directory sizes with ls -l or whatever — those numbers are semantically meaningful! :) Unfortunately we can't (currently) use those "recursive statistics" to do proper hard quotas on subdirectories as they're lazily propagated following client ops, not as part of the updates. (Lazily in the technical sense — it's actually quite fast in general). But they'd work fine for soft quotas if somebody wrote the code, or to block writes on a slight time lag. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html