Re: Building a new work group for public information retrieval protocol, ask for advices.

Graham Klyne <GK@xxxxxxxxxxxxxx> · Thu, 08 Jan 2004 09:33:13 +0000

At 15:20 08/01/04 +0800, wang liang wrote:
After publishing the message "propose some information retrieval protocols
for Internet", we received many advices. Now we
want to build a new work group for this issue, asking for more advices.

I think it's not clear that this is an appropriate IETF working group that 
is being proposed:

(a) it seems to be primarily a Web service to find information about Web 
resources, rather than a more general Internet issue, and there's a debate 
to be had about whether the IETF is the right forum for this.

(b) the proposal seems to be about defining and retrieving metadata about 
Web resources, and as such seems to have significant overlap with existing 
and planned work in W3C [1].  Specifically, the W3C have recently advanced 
the revised metadata format standard (RDF) and the companion ontology 
language (OWL) to Proposed Recommendation [2] [3].  They are also in the 
process of forming a group to define a metadata access service/protocol 
[4], whose likely function would considerably overlap what I understand 
your proposal to be doing.

You mention use of Dublin Core as part of your metadata set:  I note that 
there are already RDF specifications for the Dublin Core metadata set [5], 
and that much existing work with Dublin Core uses RDF.

If you have strong reasons for not following the RDF route, there is also a 
body of XML Query work that you seem to have overlooked [6].

I also note that I have been unable to access the web page you mention 
(http://www.lib.hust.edu.cn/dl-lib/English/main.htm).

#g
--

[1] http://www.w3.org/

[2] http://www.w3.org/RDF/

[3] http://www.w3.org/2001/sw/WebOnt/

[4] http://www.w3.org/2003/10/RDF-Query-Charter

[5] http://dublincore.org/documents/2002/07/31/dcmes-xml/
    http://dublincore.org/documents/2002/04/14/dcq-rdf-xml/

[6] http://www.w3.org/XML/Query

At 15:20 08/01/04 +0800, wang liang wrote:
After publishing the message "propose some information retrieval protocols
for Internet", we received many advices. Now we
want to build a new work group for this issue, asking for more advices.
Information retrieval service may exceed E-mail
service and become the most import service of Internet, so we can't neglect
it.

The reason to build a work group for public information retrieval protocols
lies in the disadvantage of current commercial
search engines and the improvement in future public search system.

The faults in commercial search engines.

1 In technology. Now no search engine can cover 60% of all the pages on
Internet. The average update interval of their web
pages database is almost one month. This is mainly because no of them can
close keep up with the explosive web pages on
Internet. But the web page is only one kind of information resources. There
are still many other resources such as video,
special databases, BBS, etc. Could you image single search engine company
can efficiently administer all these information
resources?

2 In business model. Now many search engine companies are concerned with how
to make profit from company users by
advertisement and ranking prominence, but never consider what its real
customers will feel. Search engine originally is tools
for the convenience of Internet customers, but search engine companies have
to apply advertisement or selling ranking
prominence, somewhat inconvenient to information retrieval, to maintain
their subsistence. In other words, search engines
make money at the cost of inconvenience of most Internet users, but not its
high quality of search service.

3 Except search engine, all the services of Internet such as E-mail, BBS,
and FTP are all based on public protocol. There is
no secret technology in these services. But the information retrieval
service, may be the most important service on Internet,
is still dominated by few search engine companies. Many experts know the
basic "Pages Ranking" algorithm, but no one know its
detail, which is commercial secret. No public surveillance, no real candid
ranking algorithm. but We all know another world
famous algorithm very well, "money can elevate ranking score". This may not
comply with the basic rules of Internet, a public
and free world.

4 In any free market, customers should be the God forever, but not few
companies.

The improvement in new public search system, DRIS (Domain resources
integrated system)

1 In technology. DRIS will build the information retrieval infrastructure of
Internet. DRIS applies a hierarchical
distributed architecture to manage all the information on Internet, just
like DNS. Its main principle is (organization level
-conventional database system)-(main sub country Internet level-metadata
harvest system)-(country level-distributed search
system).In easy words, like web pages system, every DRIS server in bottom
level like a university will download and index all
the web pages in its local network and then send the metadata to higher
layer. All the other resources are also integrated in
this method. So DRIS will improve the performance of Internet search engine
in recency, coverage and so on.

2 Management. Who will control the DRIS? It's administrated by none of us
but every of us. DRIS is managed by its users and
coordinated by a public organization, just like management method of DNS.
Every organization is its customer and also its
builder. It's just the real truth of Internet. DRIS is an opening system,
which needn't any profits from its users and of
course need not any advertisements.

3 The basic idea of DRIS : "search should be the internal function of
Internet and every one should have his own search
engine". DRIS just provide the rude search results (like the results in
current search engine). Many intelligent search
systems can apply DRIS as their data source and provide high quality of
personal or commercial search service. So commercial
search engine can still survive in the way it should be.

4 Although DRIS gives us an excellent and promising solution for the new
public Internet search system, this can't ensure the
establishment of DRIS. One important principle in technology, the best
technology is the technology that can meet the urgent
demand in society. This is just the secret of DRIS. In our testbed, in
organization level, only few universities have the web
search engine for the school network. Say nothing of union search system
that can efficiently integrate all its information
resources such as ftp, BBS and special databases in library. It's the demand
in third layer. Sharing the information
resources between different organizations is also an attraction, which is
the demand to build the second layer's DRIS. In the
top layer, integrating all the information resources on Internet may be the
dream of everyone.

5 Practice is the only principle to judge a theory. Now we have built some
experimental third layer's DRIS servers in HuBei
Province. I can only say that things just should like this.

                                                    Protocol Series of
DRIS(for work group)
Description of Working Group:

With the rapid increase of the web pages, the coverage of search engines
will become poorer and the update interval will be much longer. If the
current architecture of search engines is still in use, it will be an
impossible mission to find the precise and comprehensive information in the
future. This problem will be more serious when IPV6 technology is widely
implemented in communication networks. The problem of "Too much information
means no information" may become a disaster with information explosion. To
solve this problem, there should be an efficient information management
system for Internet.

In this group, Domain Resource Integrated System--DRIS will be proposed.
DRIS is a distributed information retrieval system, which will build the
information retrieval infrastructure for the Internet and also can be
regarded as a kind of Internet information management system.

DRIS is a hierarchical distributed search system and comprise three kinds of
information retrieval system, conventional
database system, distributed search system and metadata harvest system. We
will first define the basic search system and then
define the entire DRIS.

Specific work items are:

1 Standard distributed search system. It defines the platform-independent
search interface and a collection description
standard for heterogeneous information resources. An I-D "information
retrieval protocol for digital resources" has been
proposed.  (http://www.ietf.org/internet-drafts/draft-liang-irpdl-03.txt)

2 Standard metadata harvest system. A protocol based some available opening
standard like OAI will be proposed. It will
define a standard metadata that can be compatible with most database system.

3 Standard public web pages search system. There are many kinds of database
system. As long as they can provide the standard
distributed search interface or comply with the metadata harvest format,
they can be brought into DRIS in appropriate layer.
But web pages are special for its distributed character and astronomical
amount. To efficiently integrate web pages on
Internet, DRIS will build a public opening web pages database, which will
strictly comply with the principle of (organization
level-conventional database system)-(sub country Internet level-metadata
harvest system)-(country level-distributed system).
(More information: Make search become the internal function of Internet.
http://arxiv.org/abs/cs.IR/0311015)

4 DRIS. It will define entire DRIS. It includes its whole architecture, the
relation between different nodes, etc.
(more information: Evolution:Google vs.GRIS.
http://arxiv.org/abs/cs.DL/0312024)

5 DRIS and IPV6. The cooperation with IPV6 WG will be proposed. IPV6 will be
the most distinct feather of next generation
Internet.IPV6 is still in improving and any technology that can benefit the
Internet all can be added to the IPV6 system.
Since the searching is the main service of most user of Internet and this
service is not so satisfied to us in current
Internet, why not take this request into account when build the new
Internet. For example, in IPV6, all kinds of data flows
are assigned a priority, and then Internet can guarantee a high priority to
the data flow of DRIS. So there may need some
considerations for the relation between DRIS and IPV6.

The detailed information about DRIS could be found in
http://www.lib.hust.edu.cn/dl-lib/English/main.htm

Ask for more advices. Thanks

_______________________________________________

This message was passed through ietf_censored@xxxxxxxxxxxxxxxxxxxx, which 
is a sublist of ietf@xxxxxxxxx Not all messages are passed. Decisions on 
what to pass are made solely by IETF_CENSORED ML Administrator 
(ietf_admin@xxxxxxxx).

------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact