On 07/07/2011 11:29 PM, Amir Goldstein wrote:
On Fri, Jul 8, 2011 at 4:55 AM, Allison Henderson
<achender@xxxxxxxxxxxxxxxxxx> wrote:
On 07/07/2011 05:09 PM, Amir Goldstein wrote:
On Thu, Jul 7, 2011 at 11:19 PM, Allison Henderson
<achender@xxxxxxxxxxxxxxxxxx> wrote:
On 07/07/2011 12:52 PM, Andreas Dilger wrote:
On 2011-07-07, at 1:05 AM, Amir Goldstein wrote:
On Thu, Jul 7, 2011 at 12:05 AM, Allison Henderson
<achender@xxxxxxxxxxxxxxxxxx> wrote:
On 07/02/2011 02:33 AM, Amir Goldstein wrote:
On Fri, Jul 1, 2011 at 12:22 AM, Allison Henderson
<achender@xxxxxxxxxxxxxxxxxx> wrote:
@@ -4485,6 +4485,14 @@ void ext4_free_blocks(handle_t *handle,
struct
inode *inode,
ext4_debug("freeing block %llu\n", block);
trace_ext4_free_blocks(inode, block, count, flags);
+ if (flags& EXT4_FREE_BLOCKS_ZERO) {
+ err = sb_issue_zeroout(inode->i_sb, block, count,
GFP_NOFS);
But the delete of these blocks in not yet committed,
so after reboot, you can end up with a non-deleted but zeroed file
data.
Is that acceptable? I should think not.
One way around this is a 2-phase unlink/truncate.
Phase 1: add to orphan list and register a callback on commit
Phase 2: issue zeroout and free the blocks
This won't work for punch hole, but then again, for punch hole
it's probably OK to end up with zeroed data, but non-deleted blocks.
Right?
Hi, I had a quick question about the orphan list. I notice that
ext4_ext_truncate and also ext4_ext_punch_hole already have a call to
ext4_orphan_add that happens really early before any calls to free
blocks.
Does this address your earlier concerns, or is there another reason I
missed? Thx!
It doesn't address the concerns of getting a non-deleted file with
zeroed
data
after crash, because the existence of the inode on the orphan list
after
crash
depends on the transaction that added it to the list being committed.
And your patch zeroes the blocks before that transaction is committed.
However, the orphan list gives you a very good framework to implement
deferred delete (by a kernel thread) as Andreas suggested.
Unlink should be simple, because freeing unlinked inode blocks it is
anyway
deferred till the inode refcount drops to zero.
Right. The patch that I referenced moved all of the blocks from unlink
and truncate-to-zero from the current inode to a new temporary inode on
the
orphan list (simply copying the i_blocks field + i_block and i_size,
IIRC,
and zeroing them on the original inode).
Truncate is more tricky, because of the truncate shrink/extend
requirement
(that all data is zeroes after extending the inode's size via truncate
system call), so a shrinking-deferred truncate would have to mark all
the
to-be-deleted extents uninitialized.
It would be possible to do this for partial truncate/punch as well, to
move whole blocks over to a new inode on the orphan list and zeroing
only
the 1 or 2 partial blocks inline.
It should even be possible to leverage the "block migrate" facility used
by defrag, so that we don't duplicate this code. That would mean just
allocating a temp "unlink" inode in the kernel and putting it on the
orphan
list (like an open-unlinked file), migrate the selected range of blocks,
and then zeroing the blocks in the background before unlinking the
inode.
I don't think that just marking the deleted extents as uninitialized is
enough, since it would still leave "private" data on disk that could be
read afterward. This would also only work for extent-mapped
filesystems.
There may need to be some work to enable the migrate code on
block-mapped
files, if you want to allow secure-delete on those files, but that is
good
IMHO since it also means that we could defrag block-mapped files.
Cheers, Andreas
Ah, ok then. Yes, part of the requirements was to make secure delete
work
for partial truncates, punch hole, and also indexed files. So that will
save me some time if I can get the migrate routines work. Thx for the
pointers all!
I realized that there is a basic flaw in the concept of
deferred-secure-delete.
From a security point of view, after a crash during a secure-delete,
if the file is not there, all its data should have been wiped.
Orphan cleanup on the next mount may be done on a system that
doesn't respect secure delete.
So for real security, the unlink/truncate command cannot return before
all data is wiped.
The unlink/truncate metadata changes must not even be committed
before all data is wiped (or at least part of the data with partial
truncate).
Amir.
I see, so then it sounds like using a background thread to do the zeroing
would not help us if we have to wait for it complete anyway. Going back to
the 2 phase approach, this means that we need to do the zero out and then
the free before we do the orphan list and commit? Just trying to make sure I
understand things correctly :)
Well, that really depends of the precise definition of "secure delete".
If you agree with the "100% secure" interpretation, then your current patch
is "almost" correct.
I can see 2 things that are missing:
1. ext4_unlink() will have to invoke ext4_truncate(0) directly just
like truncate system call does.
This is to prevent an attacker from keeping the protected file open
and preventing freeing
of it's data when the file is removed from the name space.
2. ext4_truncate() currently changes i_disksize first (and adds inode
to orphan list)
and then frees the blocks. for 100% secure delete, you cannot change i_disksize
before zeroing the blocks, so it has to be:
- zeroout range
- change i_disksize and add to orphan list
- free blocks
I don't see how it could be done any other way, but maybe someone else can...
Amir.
Ok, I seem to have a lot voices that are interested in the synchronous
implementation, and I think it is a reasonable design too. So I am
going to move forward with this, but I will keep an eye on this thread.
I suppose if we get more voices that are interested in the
asynchronous approach we could always add it as an optimization later.
Thx all for your feed back!! :)
Allison Henderson
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html