Text Size

Javed Mostafa, the Victor Yngve Associate Professor of information science and director of the Laboratory of Applied Informatics at Indiana University, Bloomington, explains.

It has been estimated that the amount of textual information accessible via search engines is at least 40 times larger than the digitized content of all the books in the Library of Congress, the world's largest library. It is a challenge to provide access to such a large volume of information, yet current search engines do remarkably well in sifting through the content and identifying related links to queries.

There is a multitude of information providers on the web. These include the commonly known and publicly available sources such as Google, InfoSeek, NorthernLight and AltaVista, to name a few. A second group of sources--sometimes referred to as the "hidden web"--is much larger than the public web in terms of the amount of information they provide. This latter group includes sources such as Lexis-Nexis, Dialog, Ingenta and LoC. They remain hidden for various reasons: they may not allow other information providers access to their content; they may require subscription; or they may demand payment for access. This article is concerned with the former group, the publicly available web search services, collectively referred to here as search engines.

Search engines employ various techniques to speed up searches. Some of the common techniques are briefly described below.

To read more, click here.
Category: Science