Re: [BUG] Tgt-1.0.8 exited unexpectedly

Hirokazu Takahashi <taka@xxxxxxxxxxxxx> · Tue, 28 Sep 2010 15:15:16 +0900 (JST)

Hi,

> > > Yeah, but even real disk could return bogus data (that is, silent data
> > > corruption). So this issue (returning bogus data) is about the
> > > possibility.
> > 
> > The modern disks and HBAs can detect bogus data in most cases, but
> 
> You are talking about SCSI DIF or high-end storage systems that use
> checksumming internally?
> 
> I'm not sure the modern SATA disk can detect such failure.

I think the modern SATA disk has this feature while the IDE disk doesn't
have.

> > there are still possibilities. Yes.
> > 
> > > In addition, as you know, the recent file systems can handle such
> > > failure.
> > 
> > Yes, I know some filesystem got such a feature. But there is no point
> > to return bogus data instead of an EIO error.
> 
> Yeah, but returning EIO in such cases makes an implementation more
> complicated.

The target of VastSky is to emulate a regular block device.
If there's no valid data, it just returns EIO. The way of handling EIO
is left to filesystems. But it is just a policy.
I won't blame if there are storage systems that choose other policies.

> > > > VastSky updates all the mirrors synchronously. And only after all
> > > > the I/O requests are completed, it tells the owner that the request
> > > > is done.
> > > 
> > > Undoerstood. Sheepdog works in the same way.
> > > 
> > > How does Vastsky detect old data?
> > > 
> > > If one of the mirror nodes is down (e.g. the node is too busy),
> > > Vastsky assigns a new node?
> > 
> > Right.
> > Vastsky makes the node that seems to be down deleted from the group
> > and assigns a new one. Then, no one can access to the old one after that.
> 
> How Vastsky stores the information of the group?

Actually Vastsky has a meta data node. When the structure of a logical
volume has changed, the meta data related to the volume on the meta
data node is updated. The meta data is cached at the node that uses
the volume.

> For example, Vastsky
> assigns a new node, updates the data on all the replica nodes, and
> returns the success to the client, right after that, all nodes are
> down due to a power failure. After all the nodes boot up again,
> Vastsky can still detect the old data?

I guess Vastsky can re-synchronize them again. But I'm not really sure
how it works. I have to ask the implementor in our team.

> > > Then if a client issues READ and all the
> > > mirrors are down but the old mirror node is alive, how does Vastsky
> > > prevent returning the old data?
> > 
> > About this issue, we have a plan:
> > When a node is down and it's not because of a hardware error,
> > we will make VastSky try to re-synchronize the node again.
> 
> Yeah, that's necessary especially each nodes has huge data. Sheepdog
> can do that.
> 
> > This will be done in a few minutes because VastSky traces all write
> > I/O requests to know which sectors of the node aren't synchronized.
> 
> How Vastsky stores the trace log safely (I guess that the trace log is
> saved on multiple hosts).

Your assumption is correct.

> Vastsky updates the log per WRITE request?

Vastsky updates the log if this is the first write to a certain section
including the sectors the request has to perform. The log is cleared
periodically if the section is synchronized between the nodes.
Actually, I have to say Vastsky uses the linux md driver to achieve this,
which already have this feature.

> > And you should know VastSky won't easily give up a node which seems
> > to be down. VastSky tries to reconnect the session and even tries to
> > use another path to access the node.
> 
> Hmm, but it just means that a client I/O request takes long. Even if
> VastSky doesn't give up, a client (i.e. application) doesn't want to
> wait for long.

Some applications don't like this behavior, but this is the policy of
Vastsky. I think this behavior is the same as using FC-SAN storages.

Thank you,
Hirokazu Takahashi.
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html