matthew patton schreef op 18-05-2016 6:57:
Just want to say your belligerent emails are ending up in the trash can.
Not automatically, but after scanning, mostly.
At the same time perhaps it is worth noting that although all other
emails from this list end up in my main email box just fine, except that
yours (and yours alone) trigger the inbred spamfilter of my email
provider, even though I have never trained it to spam your emails.
Basically, each and every time I will find your messages in my spam box.
Makes you think, eh? But then, just for good measure, let me just
concisely respond to this one:
For the FS to "know" which of it's blocks can be scribbled
on and which can't means it has to constantly poll the block layer
(the next layer down may NOT necessarily be LVM) on every write.
Goodbye performance.
Simply false and I explained already that given that the filesystem is
already getting optimized for alignment with (possible) "thin" blocks
(Zdenek has mentioned this) in order to more efficiently allocate (cause
allocation) on the underlying layer, if it already has knowledge about
this alignment, and it has knowledge about its own block usage, meaning
that it can easily discover which of the "alignment" blocks it has
already written to itself, then it has all the data and all the
knowledge to know which blocks (extents) are completely "free".
Supposing you had a 4KB blockmap (bitmap).
Now supposing you have 4MB extents.
Then every 10 bits in the blockmap corresponds to one bit in the extent
map. You know this.
To condense the free blockmap into a free extent map:
(bit "0" is free, bit "1" is in use):
For every extent:
blockmap_segment = blockmap & (1023 << (extent number * 1024);
is_an_empty_extent = blockmap_segment > 0;
So it knows clearly which extents are empty.
Then it can simply be told not to write to those extents anymore.
If the filesystem is already using discards (mount option) then in
practice those extents will also be unallocated by thin LVM.
So the filesystem knows which blocks (extents) will cause allocation, if
it knows it is sitting on a thin device like that.
<quote>
However, it does mean the filesystem must know the 'hidden geometry'
beneath its own blocks, so that it can know about stuff that won't
work
anymore.
</quote>
I'm pretty sure this was explained to you a couple weeks ago: it's
called "integration".
You dumb faced idiot. You know full well this information is already
there. What are you trying to do here? Send me into the woods again?
For a long time harddisks have shed their geometry data onto us.
And filesystems can be created with geometry information (of a certain
kind) in mind. Yes, these are creation flags.
But extent alignment is also a creation flag. The extent alignment, or
block size, does not change over time all of a sudden. Not that it
should matter that much principially. But this information can simply be
had. It is no different that knowing the size of the block device to
begin with.
If the creation tools would be LVM-aware (they don't have to be) the
administrator could easily SET these parameters without any interaction
with the block layer itself. They can already do this for flags such as:
stride=stride-size
Configure the filesystem for a RAID array with stride-size
filesystem blocks. This is the number of blocks read or written
to disk before moving to next disk. This mostly affects placement
of filesystem metadata like bitmaps at mke2fs(2) time to avoid
placing them on a single disk, which can hurt the performance.
It may also be used by block allocator.
stripe_width=stripe-width
Configure the filesystem for a RAID array with stripe-width
filesystem blocks per stripe. This is typically be stride-size * N,
where N is the number of data disks in the RAID (e.g. RAID 5 N+1,
RAID 6 N+2). This allows the block allocator to prevent
read-modify-write of the parity in a RAID stripe if possible when
the data is written.
And LVM extent size is not going to be any different. Zdenek explained
earlier:
However what is being implemented is better 'allocation' logic for pool
chunk provisioning (for XFS ATM) - as rather 'dated' methods for
deciding where to store incoming data do not apply with provisioned
chunks efficiently.
i.e. it's inefficient to provision 1M thin-pool chunks and then
filesystem
uses just 1/2 of this provisioned chunk and allocates next one.
The smaller the chunk is the better space efficiency gets (and need
with snapshot), but may need lots of metadata and may cause
fragmentation troubles.
Geometry data has always been part of block device drivers and I am
sorry I cannot do better at this point (finding the required information
on code interfaces is hard):
struct hd_geometry {
unsigned char heads;
unsigned char sectors;
unsigned short cylinders;
unsigned long start;
};
Block devices also register block size, probably for buffers and write
queues:
static int bs = 512;
module_param(bs, int, S_IRUGO);
MODULE_PARM_DESC(bs, "Block size (in bytes)");
You know more about the system than I do, and yet you say these stupid
things.
For Read/Write alignment still the physical geometry is the limiting
factor.
Extent alignment can be another parameter, and I think Zdenek explains
that the ext and XFS guys are already working on improving efficiency
based on that.
These are parameters supplied by the administrator (or his/her tools).
They are not dynamic communications from the block layer, but can be set
at creation time.
However, the "partial read-only" mode I proposed is not even a
filesystem parameter, but something that would be communicated by a
kernel module to the required filesystem. (Driver!). NOT through its
block interface, but from the outside.
No different from a remount ro. Not even much different from a umount.
And I am saying these things now, I guess, because there was no support
for a more detailed, more fully functioning solution.
For 50 years filesystems were DELIBERATELY
written to be agnostic if not outright ignorant of the underlying
block device's peculiarities. That's how modular software is written.
Sure, some optimizations have been made by peaking into attributes
exposed by the block layer but those attributes don't change over
time. They are probed at newfs() time and never consulted again.
LVM extent size for a LV is also not going to change over time.
The only other thing that was mentioned was for a filesystem-aware
kernel module to send a message to a filesystem (driver) to change its
mode of operation. Not directly through the inter-layer communication.
But from the outside. Much like perhaps tune2fs could, or something
similar. But this time with a function call.
Chafing at the inherent tradeoffs caused by "lack of knowledge" was
why BTRFS and ZFS were written. It is ignorant to keep pounding the
"but I want XFS/EXT+LVM to be feature parity with BTRFS". It's not
supposed to, it was never intended and it will never happen. So go use
the tool as it's designed or go use something else that tickles your
fancy.
What is going to happen or not is not for you to decide. You have no say
in the matter whatsoever, if all you do is bitch about what other people
do, but you don't do anything yourself.
Also you have no business ordering people around here, I believe, unless
you are some super powerful or important person, which I really doubt
you are.
People in general in Linux have this tendency to boss basically everyone
else around.
Mostly that bossing around is exactly the form you use here "do this, or
don't do that". As if they have any say in the lives of other people.
<quote>
Will mention that I still haven't tested --errorwhenfull yet.
</quote>
But you conveniently overlook the fact that the FS is NOT remotely
full using any of the standard tools - all of a sudden the FS got
signaled that the block layer was denying write BIO calls. Maybe
there's a helpful kern.err in syslog that you wrote support for?
Oh, how cynical we are again. You are so very lovely, I instantly want
to marry you.
You know full well I am still in the "designing" stages. And you are
trying to cut short design by saying or implying that only
implementation matters, thereby trying to destroy the design phase that
is happening now, ensuring that no implementation will ever arise.
So you are not sincere at all and your incessant remarks about needing
implementation and code are just vile attacks trying to prevent
implementation and code from ever arising in full.
And this you do constantly here. So why do you do it? Do you believe
that you cannot trust the maintainers of this product to make sane
choices in the face of something stupid? Or are you really afraid of
sane things because you know that if they get expressed, they might make
it to the program which you don't like?
I think it is either of both, but both look bad on you.
Either you have no confidence in the maintainers making the choices that
are right for them, or you are afraid of choices that would actually
improve things (but perhaps to your detriment, I don't know).
So what are you trying to fight here? Your own insanity? :P.
You conveniently overlook the fact that in current conditions, what you
say just above is ALREADY TRUE. THE FILE SYSTEM IS NOT FULL GIVEN
STANDARD TOOLS AND THE SYSTEM FREEZES DEAD. THAT DOES NOT CHANGE HERE
except the freezing part.
I mean, what gives. You are now criticising a solution that allows us to
live beyond death, when otherwise death would occur. But, it is not
perfect enough for you, so you prefer a hard reboot over a system that
keeps functioning in the face of some numbers no longer adding up??????
Or maybe I read you wrong here and you would like a solution, but you
don't think this is it.
I have heard very few solutions from your side though, in those weeks
past.
The only thing you have ever mentioned back then was some shell
scripting stuff, If I remember any sanity here.
<quote>
In principle if you had the means to acquire such a
flag/state/condition, and the
filesystem would be able to block new allocation wherever whenever,
you would already
have a working system. So what is then non-trivial?
...
It seems completely obvious that to me at this point, if anything from
LVM (or e.g. dmeventd) could signal every filesystem on every affected
thin volume, to enter a do-not-allocate state, and filesystems would
be
able to fail writes based on that, you would already have a solution
</quote>
And so therefore in order to acquire this "signal" every write has to
be done in synchronous fashion and making sure strict data integrity
is maintained vis-a-vis filesystem data and metadata. Tweaking kernel
dirty block size and flush intervals are knobs that you can be turned
to "signal" user-land that write errors are happening. There's no such
thing as "immediate" unless you use synchronous function calls from
userland.
I'm sorry, you know a lot but you mentioned such "hints" before;
tweaking existing functionality for stuff they were not meant for.
Why are you trying to seek solutions within the bounds of the existing?
They can never work. You are basically trying to create that
"integration" you so despise without actively saying you are doing so,
instead, you seek hidden agenda's, devious schemes, to communicate the
same thing without changing those interfaces. You are tying to the same
thing, but you are just not owning up to it.
No, the signal would be something calling an existing (or new) system
function in the filesystem driver from the (presiding) (LVM) module (or
kernel part). In fact, you would not directly call the filesystem
driver, probably you would call the VFS which would call the filesystem
driver.
Just a function call.
I am talking about this thing:
struct super_operations {
void (*write_super_lockfs) (struct super_block *);
void (*unlockfs) (struct super_block *);
int (*remount_fs) (struct super_block *, int *, char *);
void (*umount_begin) (struct super_block *);
};
Something could be done something around there. I'm sorry I haven't
found the relative parts yet. My foot is hurting and I put some cream on
it, but it kinda disrupts my concentration here.
I have an infected and swollen foot, every day now.
No bacterial infection. A failed operation.
Sowwy.
If you want to write your application to handle "mis-behaved" block
layers, then use O-DIRECT+SYNC.
You are trying to do the complete opposite of what I'm trying to do,
aren't you.
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/