Re: Propose some information retrieval protocols for Internet

"wang liang" <wangliang_f@xxxxxxx> · Fri, 26 Dec 2003 11:17:26 +0800

> I believe you are talking about information *indexing* service,
> not "information *retrieval* service"

DRIS will build the information retrieval infrastructure for Internet, but
not the final search engines. Many intelligent search systems can apply DRIS
as their data source and provide high quality of personal search service.

> DRIS is only a design.
> When you say "build", who will do it? How does it get paid?
> (i.e. What is the incentive for anyone to do this?)

I have mentioned in our paper, DRIS will improve the performance of Internet
search engine in recency, coverage and so on, but this can't ensure the
establishment of DRIS.

The architecture of DRIS is organization level-sub country Internet
level-country level-whole Internet level. DRIS will first solve some urgent
problems in the bottom level, then to the top level. Just in our testbed,
CERNET (China education and research network), few universities have the web
search engine for the school network. Further more, most university has many
characteristic information resources such as Ftp, BBS and special databases
in library, but almost no a university has union search system that can
efficient integrate all these resources. To find the comprehensive, we
always have to search in many search interfaces one by one. It's the problem
in organization level. Then there still no an efficient to share these
resources among different universities. It's the problem in sub Internet
level. These all bring the request for creating the underlying structure of
DRIS. Solving some urgent problems of his own and then benefiting others may
be the real guarantee for the success of DRIS.

Who control the DRIS?It's administrated by none of us but every of us. DRIS
is managed by its users and coordinated by a public organization, just like
management method of DNS. Every organization is its customer and also its
builder and manager. It's just the real truth of Internet. DRIS is a public
opening system, which needn't any profits from its users and of course need
not any advertisements and Spam of company.

> I had a quick look of your first paper.
> - It seems to suggest that each DNS domain has a central authority,
>   which may not be the case
> - It is unclear to me DNS domains are the right unit for indexing
>   webs, as opposed to topical areas.

Current search engines all managed by corresponding company. This is the
centralized management method. This method is not suitable to manage the
information on Internet. Now there are billions of web pages, millions of
databases and many other kinds of information resources on Internet. Search
engines will encounter many bottleneck problems when the size of its
database reaches some critical values. In fact, just as a web pages search
engine, it can't continue to index close to the entire Web as it grows. Now
the update interval of most pages database is almost one month. We can also
obtain information from different special databases like IEEE's digital
library, FTP, P2P, etc. Could you image single private company can
efficiently administer all these information resources?

Every search engines try to provide the comprehensive and fresh information
for its users, but none of them would build a database system that can
mirror the whole Internet.

So a distributed management frame may be more appropriate for Internet. As
our experience, decentralized management is much more effective than
absolutely centralized administration in a large-scale system. By this
means, the key issue is how to divide the Internet correctly. We found there
has been an available division method on Internet, domain name system (DNS).
DNS is a hierarchical distributed system. All the web site on Internet is
efficiently managed in this system. The basic architecture of DNS is also an
organization level-sub country internet level-country level. We just apply
its basic idea to DRIS, but not strictly comply it.

>
> bottomline: yes the current google domination is not sustainable, however
> the basis of DRIS design raises its own problems.

Although I can't say DRIS is better than Google at now, but it can surely
meet some demands that Google can't fill. In fact,DRIS will build a system
than can integrate all kinds of resources, but not only web pages. The first
testbed of DRIS on CERNET will be finished in 2004 fall. Practice is the
only principle to judge a theory. More discussion is also very important for
a new system.