Review of programs for searching documents and data. Professional search for information on the Internet Professional search for information on the Internet

03.09.2020 Windows 8

By the middle of 2015, the global Internet network had already connected 3.2 billion users, that is, almost 43.8% of the world's population. For comparison: 15 years ago, only 6.5% of the population were Internet users, that is, the number of users increased more than 6 times! But more impressive are not quantitative, but qualitative indicators of the expansion of the introduction of Internet technologies in various areas of human activity: from global communications of social networks to household Internet things. Mobile Internet made it possible for users to be online outside the office and at home: on the road, outside the city in nature.
Currently, there are hundreds of systems for finding information on the Internet. The most popular of them are available for the vast majority of users because they are free and easy to use: Google, Yandex, Nigma, Yahoo !, Bing ... More advanced users are offered "advanced search" interfaces, specialized searches "on social networks" , on news streams and ads for purchase and sale ... But all these wonderful search engines have a significant drawback, which I already noted above as a virtue: they are free.
If investors invest billions of dollars in the development of search engines, then a quite pertinent question arises: where do they earn money?
And they make money, in particular, by providing to user requests not so much the information that would be useful from the user's point of view, but that which the owners of search engines consider useful to the user. This is done by manipulating the order of issuing lists of responses to user search queries. Here, both open advertising of certain Internet resources, and covert manipulation of the relevance of answers based on the commercial, political and ideological interests of the owners of search engines.
Therefore, among professional specialists in searching for information on the Internet, the problem of the pertinence of search engine results is very relevant.
Pertinence is the correspondence of the documents found by the information retrieval system to the information needs of the user, regardless of how fully and how exactly this information need is expressed in the text of the information request itself. This is the ratio of the amount of useful information to the total amount of information received. Roughly speaking, this is search efficiency.
Specialists who carry out a qualified search for information on the Internet need to make some effort to filter search results, filtering out unnecessary information "noise". And for this, professional-level search tools are used.
One of these professional systems is the Russian program FileForFiles & SiteSputnik (SiteSputnik).
Developer Alexey Mylnikov from Volgograd.

"The FileForFiles & SiteSputnik program (SiteSputnik) is designed to organize and automate professional search, collection and monitoring of information posted on the Internet. Particular attention is paid to receiving incoming new information on topics of interest. Several functions of information analysis have been implemented."

Monitoring and categorization of information flows

First, a few words about monitoring information flows, a special case of which is monitoring of media and social networks:

the user indicates the Sources that may contain the required information, and the Rules for the selection of this information;

the program downloads fresh links from Sources, frees their content from garbage and repetitions and organizes them into Categories according to the Rules.

To see live a simple but real monitoring process, in which 6 sources and 4 headings are involved:
open the Demo version of the program;

further, in the window that appears, click on the button Together;

and when WebsiteSputnik this Project will be executed in real time, you:
- in the "Pure Stream" list, you will see all the new information from Sources,
- in the "Post-request" section - only economic and financial news that satisfy the rule,
- in the headings "About the President", "About the Prime Minister" and "Central Bank" - information related to the relevant objects.

In real Projects, you can use almost any number of Sources and Headings.
Your first working Projects can be created in a few hours, their improvement - in the process of operation.
The described information processing is available in the SiteSputnik Pro + News package and above.

2. Simple and batch search, collection of information

To familiarize yourself with the possibilities SiteSputnik Pro(basic version of the complete set of the program) :

open the Demo version of the program;

enter your first request, for example, your full name, as I did:
and click on the button Search.

The program (see the plate built by SiteSputnik) will interrogate in a few seconds 7 sources, will open in them 24 search pages, will find 227 relevant links, removes re-encountered links and of the remaining 156 unique links will make a list "An association".

Name
Source
Ordered
pages
Downloaded
pages
Found
links
Time
search
Efficiency
search
Links
New
Efficiency
New
Yandex 5 5 50 0:00:05 32% 0 0
Google 5 5 44 0:00:03 28% 0 0
Yahoo 5 5 50 0:00:05 32% 0 0
Rambler 5 4 56 0:00:07 36% 0 0
MSN (Bing) 5 3 23 0:00:04 15% 0 0
Yandex.Blogs 5 1 1 0:00:01 1% 0 0
Google Blogs 5 1 3 0:00:01 2% 0 0
Total: 35 24 227 0:00:26 — 0 0
Total: the number of unique links - 156 duplicate links - 46 %.

(! ) Repeat your request in a few hours or days, and you will see in a separate list only new links that appeared in the SERPs for this period of time. In the last two columns of the table, you can see how many new links each Source brought and its efficiency in terms of "novelty". When executing a query multiple times, a list containing only new links , is created relative to all previous executions of this request. It would seem that this is an elementary and necessary function, but the author does not know any program in which it is implemented.

(!! ) The described capabilities are supported not only for individual requests, but also for whole request packages :
The package you see consists of seven different queries collecting information about Vasily Shukshin from several Sources, including search engines, Wikipedia, exact search in Yandex news, metasearch and search for references on TV and radio stations. Into the script TV and Radio includes: "Channel One", "TV Russia", NTV, RBK TV, "Echo of Moscow", radio company "Mayak", ... and other sources of information. Each Source has its own search or viewing depth in the pages. It is listed in the third column.
Batch search allows a comprehensive one-click search collection of information on a given topic.
Separate list new links, on repeated executions of the package, will contain only previously not found links.
Remember what and when you asked the Internet and what he answered you no need- everything is automatically saved in the libraries and in the databases of the program.
I repeat that the capabilities described in this paragraph are fully included in the package. SiteSpunik Pro.

Read more in the instructions: SiteSputnik Pro for beginners.

Name Source	Ordered pages	Downloaded pages	Found links	Time search	Efficiency search
*Yandex*	5	5	50	0:00:05	32%
*Google*	5	5	44	0:00:03	28%
*Yahoo*	5	5	50	0:00:05	32%
*Rambler*	5	4	56	0:00:07	36%
*MSN (Bing)*	5	3	23	0:00:04	15%
*Yandex.Blogs*	5	1	1	0:00:01	1%
*Google Blogs*	5	1	3	0:00:01	2%
*Total:*	35	24	227	0:00:26	—

3. Objects and search monitoring

Quite often, the User faces the following task. You need to find out what is on the Internet about a specific object: a person or a company. For example, when hiring a new employee or when a new counterparty appears, you always know the full name, company name, phone numbers, TIN, OGRN or OGRNIP, you can also take ICQ, Skype and some other data. Further, using a call to a special function of the program WebsiteSputnik "Collecting information about the object" (equipment SiteSputnik Pro + Objects):
You enter the data that you know, and with one click of the mouse perform accurate and full search for links containing specified information. The search is performed on several search engines at once, for all the details at once, for several possible combinations of recording details at once: remember how you can write down a phone number in different ways. After a certain period of time, without performing boring routine work, you will receive a list of links, cleared of repetitions and, most importantly, sorted by relevance for the desired object. Relevance (significance) is achieved due to the fact that the first links in the SiteSputnik issuance will go to the links on which large quantity the requisites you have specified, and not those that have moved up the search engine results of the Webmaster.
Important .
The SiteSputnik program is better than other programs to extract real, but not the official information about the Object. For example, in the official database of a cellular operator it may be written that the phone belongs to Vasily Terekhin, but in reality this phone contains information that Alexander was selling a Ford Focus car in 2013, which is additional information for thought.
Search monitoring .
Search monitoring means the following. If you want to track the appearance new links, for a given object or arbitrary package of requests, then you just need to periodically repeat the corresponding search. Just like for a simple query, SiteSputnik program will create a "New" list, in which it will place only those links that were not found in any of the previous searches.
Search monitoring interesting not only in itself. It can be used in monitoring of media, social networks and other news sources, which was mentioned above in paragraph 1. Unlike other programs, in which it is possible to remove new information only from RSS feeds, the program WebsiteSputnik can be used for this embedded searches and search engines ... Also possible emulation(self-creation) several Rss feeds from arbitrary pages, moreover, emulation of an RSS stream on request and even a batch of requests.

To get the most out of the program, use its main features, namely:

request packages, packages with parameters, use the Assembler (collector), the "Analytic union" operation of the results of several tasks, if necessary, apply the basic search functions on the invisible Internet;

connect your sources to the information sources built into the program : other search engines and embedded searches, existing rss feeds you create own rss feeds With arbitrary pages, use the search function for new sources;

use the possibilities of the following types monitoring: Media, social networks and other sources, monitoring comments to news and messages, track the appearance of new information on existing pages;

engage Categories , External Features, Task Scheduler, Mailing List, Multiple Computers, Project Instructor, Install alarm To notify you of the occurrence of significant events, use the other functions listed below.

4. SiteSputnik program (SiteSputnik): equipment options and functions

- Program SiteSputnik is constantly improving in the direction of: "I need to find everything and with a guarantee".
"A program for interrogating the Internet", is another definition of the User for assigning the program.
A. Search and collection functions.
. Request package - execution of several queries at once, combining search results or separately. When forming the combined result, the newly found links are deleted. More information about packages - in the introduction to SiteSputnik, clearly - in the video: a joint and separate execution of requests. There are no analogues in domestic and foreign developments.
. Parameter packages. Any queries and query packages designed to solve standard search tasks, for example, search by phone, full name or e-mail, - can be parameterized, saved and executed from a library of ready-made queries with the substitution of actual (required) parameter values. Each parameter package has its own special extended search form ... You can use not one, but several search engines in it. It is possible to create forms that are very complex in their functional purpose. It is extremely important that shape can be created by the users themselves, without the participation of the author of the program or programmer. It is very simply written about this in the instructions, in more detail in a separate publication on the parameterization of search and on the forum, clearly in the video: search at once for all options for recording a number mobile phone and for several variants of address recording Email... There are no analogues.
. Assembler NEW- assembly of a search task from several ready-made : requests, request packages and parameter packages. Packages can contain other packages in their text. The nesting depth of packages is unlimited. You can create several search tasks, for example, about several legal entities and individuals, and perform these tasks at the same time. More details on the forum and in a separate publication about Assembler, clearly at video... There are no analogues.
. Metasearch - execution of a specific query simultaneously at a given "depth" of search for each of them. Metasearch is possible by built-in search engines, which include Yandex, Rambler, Google, Yahoo, MSN (Bing), Mail, Yandex and Google blogs, and by connected search tools. Working with multiple search engines looks like you are working with one search engine ... The rediscovered links are deleted. A clear metasearch for three connected social networks: VKontakte, Twitter and Youtube, is shown on video.
. Site metasearch - combining site search in Google, Yahoo, Yandex, MSN (Bing). Visually on video.
. Metasearch in office documents - combining search in PDF, XLS, DOC, RTF, PPT, FLASH files in Google, Yahoo, Yandex, MSN (Bing). You can choose any combination of file formats.
. Metasearch cache copies links in Yandex, Google, Yahoo, MSN (Bing). A list is compiled, in each paragraph of which all the snippets found for each link by each search engine are collected. There are no analogues.
. Deep search for Yandex, Google and Rambler allows you to combine into one list all links from regular search and all links, respectively, from the lists "More from the site", "Additional results from the site" and "Search on the site (Total ...)". Learn more about deep search on the forum. There are no analogues.
. Exact and complete search ... This means the following. On the one hand, each request can be executed on that and only on the source in the request language of which it is written. This exact search... On the other hand, there can be any number of such requests and sources. This ensures full search... More details in a separate publication on procedural search. There are no analogues.
. Search on the invisible internet .
It includes the following basic functions:
A special package of requests that can be improved by the User,
- search for invisible links using a spider (spider),
- search for invisible links in the vicinity of a visible link or folder by "image and likeness",
- special searches for open folders,
- search for invisible links and folders with standard names using special dictionaries,
- using your own built-in searches.
More details in a separate publication on SiteSputnik Invisible. Basic functions "are well known in narrow circles", but the way of their application has no analogues. The essence of this method is to build a sitemap visible from the Internet (in other words, materialization of the visible Internet), and only on the basis of visible links and relative to them, search for invisible links. The search for already visible links by "invisible" methods is not carried out.
B. Information monitoring functions.
. Monitoring for the appearance on the Internet new links on a given topic. Monitor the appearance new links using whole request packages that use any of the search methods mentioned above, and not the individual first pages of search engines. Union and intersection implemented new links from multiple separate searches. More details in the publication on monitoring (see § 1) and on the forum. There are no analogues.
. Collective information processing ... Creation corporate or professional network for collective collection, monitoring and analysis of information. The participants and creators of such a network are employees of the corporation, members of the professional community or interest groups. The geographic location of the participants does not matter. More details in a separate publication on organizing a network for collective collection, monitoring and analysis of information.
. Monitoring links (web pages) in order to detect changes in their content (content). Beta version. Found changes are highlighted with color and special characters. More details in a separate publication on monitoring (see § 2 and 3).
V. Information analysis functions.
. Materials categorization already described above. More details - in a separate publication on Headings. The rules for entering Categories allow you to specify keywords and the distance between them, set logical "AND", "OR" and "NOT", apply a multilevel parenthesis structure and dictionaries (insert files) to which you can apply logical operations.
. VF technology - almost arbitrary expansion of the possibility of rubricating materials through the implementation of external functions that are organically built into the Rules for entering the Rubrics and can be implemented by the programmer independently without the participation of the author of the program.
. Numerical analysis occupancy of Headings, installation alarms and notification of the occurrence of significant events by highlighting the Headings in color and / or sending an alarm report by e-mail.
. Actual relevance. It is possible to arrange links in order close to significance these links in relation to the task at hand, bypassing the tricks of webmasters using various methods to increase the ranking of sites in search engines. This is achieved by analyzing the results of several "diverse" queries on a given topic. Calculated, in the literal sense of the word, links containing maximum required information ... More details in the description of the method for finding the optimal supplier and on the forum. There are no analogues.
. Calculating Object Relationships - search for links, resources (sites), folders and domains on which objects are simultaneously mentioned. The most common objects are people and businesses. To search for connections, all program tools mentioned on this page can be used. SiteSputnik, which significantly increases the efficiency of your work. The operation is performed on any number of objects. More details in the introduction to the program, as well as in the description of the new function "objects and their connections". There are no analogues.
. Formation, unification and intersection of information flows on a variety of topics, stream matching. More details in a separate threading post.
. Building web maps sites, resources, folders and searched objects based on links found on the Internet using Google, Yahoo, Yandex, MSN (Bing) and Altavista links belonging to the site. Experts can find out: is it visible "superfluous" information from the Internet on their sites, as well as research the sites of competitors on this subject. A web sitemap is materialization of the visible internet ... More details in a separate publication on building web maps, clearly at video... There are no analogues.
. Search for new sources of information on a given topic, which can then be used to track the emergence of new necessary information. More details at.
G. Service functions.
. Scheduler provides work Scheduled: performs the specified program functions at a specified time. More details in a separate publication on the Planner.
. Project Instructor NEW is an assistant for creation and maintenance Projects for the search, collection, monitoring and analysis of information (heading and signaling). More details on the forum.
. Automatic archiving. V databases all the results of your work are automatically remembered, namely: queries, query packages, search and monitoring protocols, any other functions listed above and the results of their execution. Can structure work on topics and subtopics.
. Database includes sorts, simple search, and arbitrary SQL search. For the latter, there is a wizard for writing SQL queries. Using these tools, you can find and familiarize yourself with the work that you did yesterday, last month, a year ago, define a topic as a search criterion or set another search criterion on the content of the database.
. Technical constraints search engines. Some restrictions, such as the length of the query string, can be overcome. Provides the execution of not one, but several queries with the combination of search results or separately. You can read about a way to overcome the violation of the law of additivity for the main search engines. For one word or one phrase in quotation marks, a case-sensitive search in search engines, in particular, an abbreviation search, has been implemented.
Built in browser . Navigator through the pages. Multicolor marker to highlight keywords and arbitrary words. Bilisting and N-listing from generated documents.
. Unloading news feeds into a tabular view focused on import in Excel, MySQL, Access, Kronos and other Applications.

5. Installing and running the Program, computer requirements.

To install and run the program:

Download the file, copy the FileForFiles folder from it to your hard drive, for example, to D: \;

Demo version of the program will be installed and will open.

The program will work on any computer with any version of Windows installed.
What is it

DuckDuckGo is a fairly well-known open source search engine. The servers are located in the USA. In addition to its own robot, the search engine uses the results of other sources: Yahoo, Bing, Wikipedia.

The better

DuckDuckGo positions itself as a search engine that provides maximum privacy and confidentiality. The system does not collect any user data, does not store logs (no search history), the use of cookies is as limited as possible.

DuckDuckGo does not collect or share personal information from users. This is our privacy policy.
Gabriel Weinberg, founder of DuckDuckGo

Why do you need it

All major search engines try to personalize search results based on the data about the person in front of the monitor. This phenomenon is called the "filter bubble": the user sees only those results that agree with his preferences or that the system considers as such.

Forms an objective picture that does not depend on your past behavior on the web, and gets rid of thematic advertising Google and Yandex, based on your queries. DuckDuckGo makes it easy to search for information in foreign languages, while Google and Yandex by default give preference to Russian-language sites, even if the request is entered in another language.

What is it

not Evil is a search engine for the Tor anonymous network. To use it, you need to go to this network, for example by launching a specialized one.

not Evil is not the only search engine of its kind. There is LOOK (the default search in the Tor browser, accessible from the regular Internet) or TORCH (one of the oldest search engines in the Tor network) and others. We settled on not Evil because of the unambiguous hint of Google (just look at the start page).

The better

Searches where Google, Yandex and other search engines are closed in principle.

Why do you need it

There are many resources on the Tor network that cannot be found on the law-abiding Internet. And their number will grow as the government tightens its control over the content of the Web. Tor is a kind of network within the Network with its own social networks, torrent trackers, media, marketplaces, blogs, libraries, and so on.

3. YaCy

What is it

YaCy is a decentralized search engine based on P2P networks. Each computer on which the main software module is installed scans the Internet independently, that is, it is an analogue of a search robot. The results obtained are collected in a common database, which is used by all participants in YaCy.

The better

It is difficult to say whether it is better or worse here, since YaCy is a completely different approach to organizing search. The absence of a single server and company-owner makes the results completely independent of someone's preferences. The autonomy of each node excludes censorship. YaCy is capable of searching the deep web and non-indexed public networks.

Why do you need it

If you are a supporter of open source and the free Internet, which is not influenced by government agencies and large corporations, then YaCy is your choice. It can also be used to organize searches within a corporate or other autonomous network. And while YaCy is not very useful in everyday life, it is a worthy alternative to Google in terms of the search process.

4. Pipl

What is it

Pipl is a system designed to search for information about a specific person.

The better

The authors of Pipl claim that their specialized algorithms search more efficiently than "regular" search engines. In particular, priority is given to social media profiles, comments, lists of participants and various databases where information about people is published, such as databases of court decisions. Pipl's leadership in this area has been validated by Lifehacker.com, TechCrunch and others.

Why do you need it

If you need to find information about a person living in the United States, then Pipl will be much more effective than Google. The databases of Russian courts are apparently inaccessible to a search engine. Therefore, he does not cope with the citizens of Russia so well.

What is it

FindSounds is another specialized search engine. Searches open sources for various sounds: house, nature, cars, people, and so on. The service does not support queries in Russian, but there is an impressive list of Russian-language tags that you can search for.

The better

The results are only sounds and nothing more. In the settings, you can set the desired format and sound quality. All found sounds are available for download. Search by pattern is available.

Why do you need it

If you need to quickly find the sound of a musket shot, the blows of a sucking woodpecker or the scream of Homer Simpson, then this service is for you. And we chose this only from the available Russian-language requests. In English, the spectrum is even wider.

Seriously, a specialized service assumes a specialized audience. But what if it comes in handy?

What is it

Wolfram | Alpha is a computational search engine. Instead of links to articles containing keywords, it provides a ready-made response to a user's request. For example, if you enter "compare the populations of New York and San Francisco" in English into the search form, Wolfram | Alpha will immediately display tables and graphs with a comparison.

The better

This service is better than others for finding facts and calculating data. Wolfram | Alpha collects and organizes knowledge available on the Web from a variety of fields, including science, culture and entertainment. If this database contains a ready-made answer to a search query, the system shows it; if not, it calculates and displays the result. In this case, the user sees only and nothing superfluous.

Why do you need it

If you are, for example, a student, analyst, journalist, or research scientist, you can use Wolfram | Alpha to find and calculate data related to your work. The service does not understand all requests, but it is constantly evolving and becoming smarter.

What is it

The Dogpile metasearch engine displays a combined list of results from search results from Google, Yahoo and other popular search engines.

The better

First, Dogpile displays fewer ads. Secondly, the service uses a special algorithm to find and show the best results from different search engines. According to the developers of Dogpile, their system generates the most complete search results on the entire Internet.

Why do you need it

If you cannot find information in Google or another standard search engine, search for it in several search engines at once using Dogpile.

What is it

BoardReader is a system for text search in forums, Q&A services and other communities.

The better

The service allows you to narrow the search field to social platforms. Thanks to special filters, you can quickly find posts and comments that match your criteria: language, publication date and site name.

Why do you need it

BoardReader can be useful for PR specialists and other media professionals who are interested in the opinion of the mass on certain issues.

Finally

The life of alternative search engines is often fleeting. Lifehacker asked Sergei Petrenko, the former general director of the Ukrainian branch of Yandex, about the long-term prospects of such projects.

Sergey Petrenko

Former CEO of Yandex.Ukraine.

As for the fate of alternative search engines, it is simple: to be very niche projects with a small audience, therefore, without clear commercial prospects, or, conversely, with complete clarity of their absence.

If you look at the examples in the article, you can see that such search engines either specialize in a narrow but demanded niche, which, perhaps only so far, has not grown enough to be noticeable on Google or Yandex radars, or they are testing an original hypothesis in ranking. which is not yet applicable in regular search.

For example, if a search on Tor suddenly turns out to be in demand, that is, results from there will be needed at least by a percentage of Google's audience, then, of course, ordinary search engines will begin to solve the problem of how to find and show them to the user. If the audience behavior shows that the results seem more relevant to a noticeable share of users in a noticeable number of queries, data without taking into account user-dependent factors, then Yandex or Google will begin to give such results.

“To be better” in the context of this article does not mean “to be better at everything”. Yes, in many aspects our heroes are far from Yandex (even Bing is far away). But on the other hand, each of these services gives the user something that the giants of the search industry cannot offer. Surely you also know similar projects. Share with us - we will discuss.

Introduction

Currently, the Internet unites hundreds of millions of servers that host billions of different sites and individual files containing various types of information. It is a gigantic repository of information. There are various methods of searching for information on the Internet.

Search for a known address. The necessary addresses are taken from directories. Knowing the address, it is enough to enter it into the address bar of the Browser.

Example 1. www.gov.ru is a server of Russian state authorities.

Construction of the address by the user. Knowing the system of forming an address on the Internet, you can construct addresses when searching for Web sites.

A thematic or geographic domain must be added to the keyword (the name of a company, enterprise, organization, or a simple English noun), and intuition must be included.

Example 2. Addresses of commercial web pages:

www.samsung.com (SAMSUNG company),

www.mtv.com (MTV Music News).

Example 3. Addresses of educational institutions:

www.ntu.edu (US National University).

Internet search engines

To search for information on the Internet, special information retrieval systems have been developed. Search engines have a regular address and are displayed as a Web page containing special tools for organizing a search (search string, subject directory, links). To call the search engine, just enter its address in the address bar of the Browser.

According to the statistics service LiveInternet.ru, the distribution of search engines in Russia is approximately the following:

2) Google - 35.0%

3) Search Mail.ru - 8.3%

4) Rambler - 0.9%

According to the method of organizing information, information retrieval systems are divided into two types: classification (rubricators) and dictionary.

Rubricators (classifiers)- search engines that use a hierarchical (tree-like) organization of information. When searching for information, the user looks through thematic headings, gradually narrowing the search field (for example, if you need to find the meaning of a word, then first you need to find a dictionary in the classifier, and then find the desired word in it).

Dictionary search engines are powerful automatic hardware and software systems. With their help, information on the Internet is viewed (scanned). Data on the location of this or that information are entered into special reference books-indexes. In response to a query, a search is performed according to the query string. As a result, the user is offered the addresses (URLs) at which the search word or group of words was found at the time of scanning. By choosing any of the suggested URL-links, you can go to the found document. Most of today's search engines are mixed.

The most famous and popular search engines:

There are systems specializing in the search for information resources in various areas.

https://my.mail.ru

https://ru-ru.facebook.com

https://twitter.com

https://www.tumblr.com

https://www.instagram.com etc.

Subject search engines:

Search for software:

Directories (thematic collections of links with annotations):

http://www.atrus.ru

Request execution rules

In each search engine, the Help section provides information on how to search, how to compose a query string. Below is information about a typical, "average" query language.

Simple Query

Enter one word defining the topic of your search. For example, in the Rambler.ru search engine it is enough to enter: automatic.

There are documents that contain the words specified in the request. All forms of words of the Russian language are recognized, as a rule, the case of letters is ignored.

You can use the "*" or "?" Character in the query. Sign "?" a single character is replaced in the keyword, which can be replaced by any letter, and the "*" sign is a sequence of characters.

For example, the query automatic * will allow you to find documents that include the words automatic, automatic, etc.

Complex query

It is often necessary to combine keywords to obtain more specific information. In this case, additional linking words, functions, operators, symbols, combinations of operators, separated by brackets are used.

For example, the query music & (beatles beatles) means that the user is looking for documents containing the words music and beatles or music and the beatles.

List of search engines and directories

The address Description
www.excite.com Search engine with site reviews and guides
www.alta-vista.com Search server, advanced search capabilities available
www.hotbot.com Search Server
www.ifoseek.com Search server (easy to use)
www.ipl.org Internet Publik library, a public library operated by the World Village project
www.wisewire.com WiseWire - Artificial Intelligence Search
www.webcrawler.com WebCrawler - search engine, easy to use
www.yahoo.com Web directory and interface for accessing full-text search on the AltaVista server
www.aport.ru Aport - Russian-language search server
www.yandex.ru Yandex - Russian-language search server
www.rambler.ru Rambler - Russian-language search server
Online Help Resources
www.yellow.com Internet Yellow Pages
monk.newmail.ru Search engines of various profiles
www.top200.ru Top 200 Web Sites
www.allru.net
www.ru Catalog of Russian Internet resources
www.allru.net/z09.htm Educational resources
www.students.ru Server of Russian students
www.cdo.ru/index_new.asp Distance Learning Center
www.open.ac.uk Open University of Great Britain
www.ntu.edu US National University
www.translate.ru Electronic text translator
www.pomorsu.ru/guide.library.html List of links to network libraries
www.elibrary.ru Scientific electronic library
www.citforum.ru E-library
www.infamed.com/psy Psychological tests
www.pokoleniye.ru Internet Education Federation website
www.metod.narod.ru Educational resources
www.spb.osi.ru/ic/distant Distance learning on the Internet
www.examen.ru Exams and tests
www.kbsu.ru/~book/ Computer science textbook
Mega.km.ru Encyclopedias and dictionaries

Professional search for information on the Internet

Searching for information is one of the most common and at the same time the most difficult tasks that any user has to face on the Internet. However, if for an ordinary member of the network community knowledge of effective information retrieval methods is a desirable, but far from obligatory quality, for information professionals the ability to quickly navigate the Internet resources and find the required sources is one of the basic qualifications.

The reason for the difficulties encountered in information retrieval on the Internet is determined by two main factors. First, the number of sources on the Web is extremely large. At the end of 2001, the most rough estimates put an estimated 7.5 billion documents located on servers around the world. Secondly, the array of information on the Web is not only colossal in volume, but also extremely dynamic. In the half-minute that you spent reading the first lines of this section, about a hundred new or changed documents appeared in the virtual universe, dozens were moved to new addresses, and a few ceased to exist forever. The Internet never "sleeps", as our planet never "sleeps", on which the wave of business activity of mankind is constantly rolling in accordance with the change of time zones.

Unlike a stable and controlled collection of documents in a library, on the Web we are dealing with a gigantic and continuously changing information array, the search for data in which is a very, very difficult process. The situation is often very reminiscent of the well-known problem of finding a needle in a haystack, and sometimes information of great value remains unclaimed solely because of the difficulty of finding it.

The majority of users of global computer networks have more or less informational search skills. Amateurs and professionals alike often use the same tools. However, the results of surveys and the time spent on them differ to a very large extent.

The objective of this section is to familiarize yourself in detail with the tools and methods of information retrieval and to develop sustainable skills for professional search on the Web for all types of data: from texts in any format to video and animation.

PROFESSIONAL SEARCH FOR INFORMATION ON THE INTERNET
Internet search is an important element of the web. Hardly anyone knows the exact number of web resources on the modern Internet. In any case, the bill goes into the billions. In order to be able to use the information necessary at a given moment, no matter whether it is for work or entertainment purposes, you first need to find it in this constantly replenished ocean of resources.
For internet searches to be successful, two conditions must be met: queries must be well-formulated and must be asked in the right places. In other words, the user is required, on the one hand, the ability to translate his search interests into the language of the search query, and on the other hand, a good knowledge of search engines, available search tools, their advantages and disadvantages, which will allow him to choose the most suitable search means in each case. ...
Currently, there is no single resource that meets all the requirements for Internet search. Therefore, a serious approach to search inevitably has to use different tools, using each in the most appropriate case.
The main internet search toolscan be divided into the following main groups:
Search engines;
Web directories;
Help resources;
Local programs for searching the Internet.
The most popular search tools aresearch engines- the so-called Internet search engines (Search Engines). The three leaders globally are quite stable - Google, Yahoo! and Bing. Many countries add their own local search engines to this list, which are optimized for local content. With their help, theoretically, you can find any specific word on the pages of many millions of sites. From the user's point of view, the main disadvantage of search engines is the inevitable presenceinformation noisein the results. So it is customary to call the results that are in the search list for one reason or another that do not correspond to the request.
Despite many differences, all Internet search engines work on similar principles and, from a technical point of view, consist of similar subsystems. The first structural part of a search engine is special programs used for automatic search and subsequent indexing of web pages. Such programs are commonly referred to as spiders, or bots. They scan the code of web pages, find links located on them, and thereby discover new web pages. There is also an alternative way to include a site in the index. Many search engines offer resource owners the opportunity to independently add a site to their database. Be that as it may, then the web pages are downloaded, analyzed and indexed. In them, structural elements are highlighted, keywords are found, their links with other sites and web pages are determined. Other operations are also performed, the result of which is the formation of the index base of the search engine. This base is the second main element of any search engine. Currently, there is no one absolutely complete index base that would contain information about all the content of the Internet. Since different search engines use different programs to search web pages and build their index using different algorithms, the index bases of search engines can vary significantly. Some sites are indexed by several search engines, but there is always a certain percentage of resources included in the base of only one search engine. The fact that each search engine has such an original and non-overlapping part of the index allows us to make an important practical conclusion: if you use only one search engine, even the largest, you will definitely lose a certain percentage of useful links.
The next part of the Internet search engine is the actual search and sorting programs. These programs solve two main tasks: first, they find the pages and files in the database that correspond to the received request, and then sort the resulting data array in accordance with various criteria. Success in achieving search goals largely depends on the effectiveness of their work.
The final element of an internet search engine is the user interface. In addition to the usual requirements for aesthetics and convenience for any sites, there is another important requirement for search engine interfaces: they must offer various tools for composing and refining queries, as well as sorting and filtering results. The advantages of search engines are excellent coverage of sources, relatively fast updating of the database content and a good selection of additional functions.
The main tool for working with search engines is a query.
For Internet searches, special applications are also used that are installed on the local computer. These can be both simple programs and rather complex complexes of data search and analysis. The most common search plugins for browsers, browser panels designed to work with a specific search service, and metasearch packages with results analysis capabilities.
Web directories - these are resources in which sites are divided into thematic categories. If the user works with search engines only through queries, then the catalog has the ability to view thematic sections in their entirety. The second fundamental difference between catalogs and automatic search engines is that, as a rule, people are directly involved in their content, who browse resources and refer the site to one or another category. It is customary to divide web directories into universal and thematic. Universal try to cover as many topics as possible. They contain everything from poetry websites to computer resources. In other words, they have the maximum search breadth. Subject catalogs, on the other hand, specialize in a specific topic, providing the maximum search depth by reducing the breadth of coverage of resources.
The advantages of directories are the relatively high quality of resources, since each site in it is viewed and selected by a person. Thematic grouping of sites allows you to conveniently arrange sites with related topics. This mode of operation is good for discovering new sites for you on a topic of interest - it is more accurate than using a search engine. It is recommended to use web catalogs for the first acquaintance with any subject area, as well as searching for fuzzy queries - you will have the opportunity to "wander" through the sections of the catalog and more precisely determine what exactly you need.
The disadvantages of web directories are well known. First of all, this is a slow replenishment of the database, since the inclusion of a site in the catalog requires human participation. In terms of responsiveness, the web directory is not a competitor to search engines. In addition, web directories are significantly inferior to search engines in terms of database sizes.
Speaking about Internet search, one cannot ignore a number of terms that are closely related to this area and are often used to describe and evaluate search engines. For instance: latitude and depth internet search. A search is called broad if it captures as many sources of information as possible. In this case, at least a mention of a particular site request is considered sufficient. Search depth refers to the details of the indexing and subsequent search for each specific resource. For example, many search engines have different approaches to indexing different sites. Large and popular sites are indexed to the maximum extent, robots try not to miss a single page of such a resource. At the same time, on other sites, only the title page and a couple of pages of content can be indexed. These circumstances, naturally, affect the subsequent search. Deep search works on the principle of "it is better to include unnecessary information in the results than to omit any data relevant to the search."
Quite often you can find concepts such as global and local internet search. Local Internet searches take into account the user's geographic location and give preference to results that are somehow related to a specific country or area. In a global search, this information is not taken into account, and the search is carried out in all available resources.
When composing a query on Internet search engines, various search modes operate. Typical search modes that are found on most Internet machines include simple and advanced Search. Simple search allows you to specify only one search term in one query. Advanced search makes it possible to compose a query from several conditions, linking them with logical operators.
To refine search queries, various filters ... Filters are those or other query composing aids that do not relate to the content side of the query conditions, but restrict the search results to some formal feature. So, for example, when applying a file type filter during a search, the user does not tell the system information related to the subject of his request, but simply limits the results obtained to a specific file type specified in the condition of his request.
For most users, universal search engines are the main and often the only means of Internet search. They offer good source coverage, as well as a set of tools sufficient for basic search tasks.
The market for universal search engines is large enough. We tried to analyze the most famous search engines, and the results are presented in the form of Table 1.

When choosing a universal search engine, the quality of the resources found with its help plays an important role. You can determine the preferred search engine for specific tasks using the "marker method". Its essence lies in the fact that first a certain thematic search query is drawn up, after which a group of people - experts in this field - is interrogated to identify the best, in their opinion, Internet resources on the chosen topic. Based on the survey data, a list of marker sites is formed that are guaranteed to be relevant to the request and contain high-quality information. Then the request is sent to the tested search engines. The evaluation logic is simple: the higher the marker sites are located in the search results, the better a particular resource is suitable for finding information on a test topic.

To say that in our time of information technology and the endless growth of the amount of data available to both an individual and society, there are many problems with the processing of information and its search is already blasphemy. Who only does not raise this topic. And in order not to burden you with subjective and, in part, objective judgments gleaned from various information sources regarding the problem, I will go directly to its solution. Today we'll talk about search. That is, about programs and serious information systems that search for the documents and data we need.
Direct Search Upgrade
Not so long ago, when the trees were large, and there was not much information even on the local network of the enterprise, any search was carried out by a banal search through a handful of available files and a sequential check of their names and contents. Such a search is called direct, and programs (utilities) that use direct search technology are traditionally present in all operating systems and toolkits. But, even the power of modern computers is not enough for a fast and adequate search in huge amounts of data during direct search. Going through a couple of hundred documents on disk and searching a huge library and several dozen mailboxes are two different things. Therefore, direct search programs today are clearly fading into the background - when it comes to universal means.
Of course, in the corporate sector, this type of search has not been in demand for a long time. The volumes are not the same. And, therefore, for many years now, and recently it has been unambiguous, technologies capable of quickly and accurately searching for documents of various formats and from various sources are more than relevant. Not so long ago, Microsoft's "dad" Bill Gates, having envied, apparently, the phenomenal success of the Internet search engine Google, at one of the press conferences announced the desire of software (and not only) to promote, develop and deepen the creation of search engines and technologies in every possible way. But it is too early to create any phenomenally working program from Microsoft or a competitive server on the Internet (MSN still falls short of Google). Therefore, let us turn to the already existing developments. Index, query, relevance
Modern technologies are based on two fundamental processes. Firstly, it indexes the available information and processes the request with the subsequent output of the results. As for the first, any program (be it a desktop search engine, corporate information system or Internet search engine) creates its own search area. That is, it processes documents and forms an index of these documents (an organized structure that contains information about the processed data). In the future, it is the created index that is used for work - quickly obtaining a list of the necessary documents according to the request. The rest, although by no means simple in terms of technology, is quite understandable to the average user. The program processes the request (for a key phrase) and displays a list of documents that contain this key phrase. Since the information is contained in a structured index, the processing of a query is much (tens and hundreds of times!) Faster than in the case of direct search (the selection of documents is carried out not by enumerating files, but by analyzing text information in the index).
The program displays the found documents in the resulting list according to relevance - the correspondence of the document to the query text. In various technologies, of course, there are various methods of searching and determining the relevance of a document (the number of "occurrences" of a word and its frequency of mention in the document, the ratio of these parameters to the total number of words in the document, the distance between the words of the query phrase in the searched files, and so on). Based on these parameters, the "weight" of the document is determined and, depending on it, a particular file appears in the list of results at a certain position. In the case of Internet searches, the situation is even more complicated. Indeed, in this case, many other factors must be taken into account (Google's Page Rank is an example). But this is a topic for a separate article, so we will not touch the Internet.
This article discusses the capabilities of several popular search programs that can boast of both decent speeds and good functionality. But bragging in advertising brochures is one thing, but withstanding the gaze of an expert is quite another. And experts were found neither more nor less than a full office of those who like to dig into software for its usability. The experimental computer (Athlon 2.2 MHz, with 1 GB of RAM, 160 GB Seagate 7200 rpm IDE hard drive and Windows XP) was equipped with a set of programs: dtSearch Desktop, Snoop Prof Deluxe, Google Desktop Search, SearchInform , Copernic Desktop Search, ISYS Desktop. For the tests, a text base of documents in doc, txt and html formats was compiled with a total size of neither more nor less, but 20 gigabytes. A group of comrades under the guidance of your humble servant tested, compared and shared their subjective impressions on each software. Read a summary of the findings below. dtSearch Desktop
A program that claims, according to the developers, to be the fastest, most convenient and best search engine. As, in general, and all the others from this review. DtSearch's interface is quite simple, but some windows or tabs are somewhat overloaded with elements, which gives the impression of being difficult to use. But in reality, there are no particular difficulties. The only really unpleasant moment is the lack of support for the Russian language software (despite the fact that the program can search for documents in several languages, its interface is exclusively English).
But dtSearch is one of the few programs that can index web pages to a user-specified "depth" (albeit taking into account the "additional purchase" in the dtSearch Spider add-on kit). This is in addition to supporting files on disk in various text formats and emails from the Outlook mailbox. At the same time, the program does not know how to work with databases, which are such a tasty morsel for search engines because of the large amounts of information in them, and widespread use in companies, and therefore in corporate networks. The indexing speed of dtSearch documents turned out to be at the proper level. Looking ahead, I will say that this program coped with indexing a given amount of information on a level with another competitor - iSYS - and shared with him the second place in the list of the fastest systems. DtSearch indexed the test 20 gigabytes of information in 6 hours 13 minutes, creating an index of 7.9 GB for the needs of subsequent searches.
As for the search capabilities, here they are at the proper level. First, there is morphological search in dtSearch (search for a word in all its morphological forms). Using this opportunity, you free yourself from, say, such thoughts as "in what case was a certain word used in the document I need?" The use of morphological search is almost always justified, therefore it should be present in any professional search engine.
Sound search is a non-standard feature even for professional search engines. Its essence lies in the fact that the program will search for words that sound the same as the word you entered. And the best part is that this function works for the Russian language too! For example, typing the word "ear" in a search query, you will see in the result not only the words "ear", but also "ear".
Error-correcting search is a very important feature. It is used to search for words containing syntax errors - these can be both typos and errors in documents obtained using character recognition systems, for example. A simple example is you are looking for the word keyboard. Some document contains the word "keyboard", it is obvious that in fact this is the word "keyboard", just a person typing the text was typed. Now, a search with error correction will detect and include the document with the word "keypad" in the result. Also in dtSearch there is a setting that allows you to determine the degree of possible erroneous characters.
Search using synonyms. This feature uses a list of synonyms for different words. So, for example, by entering the word "fast", the program will also find the words "fast" and others that are synonyms for the word "fast", if such are, of course, present in the list of synonyms. A ready-made list of synonyms is not supplied with the dtSearch program, however, it is possible to use the lists on the Internet (accordingly, a connection is required, which is not always convenient), or you can create your own list of synonyms.
In addition to the listed features, dtSearch can search using phrases consisting of words connected by logical operations. Each word in the query can be assigned its own "weight", that is, significance. A useful option is to use a dictionary consisting of insignificant words in order not to take them into account when searching, but this dictionary is also empty and you will have to fill it in yourself.
Next, we will consider the capabilities of the program when working in a network. In fact, dtSearch does not offer any specific networking capabilities. Nevertheless, it is quite possible to use it on the web. Alternatively, you can create some kind of index and put it in a public (shared) folder. The program itself can be installed for each user on a computer, or it can also be placed on a shared folder, and shortcuts for each user can be created in a special way using the command line parameters, the purpose of which is described in the help file supplied with the program. Also, it is possible to automatically install the program on the network using an MSI file. This will take into account the settings for each connected user.
In general, this is a good program from the category of professional search engines. It can claim to be a good mark, but gaining trust and respect from users can be difficult for dtSearch due to some factors (not everything is smooth with the interface, Russian users are deprived, there are no bright features for working with the network). As for the search for documents directly, the program had no overlaps with the Russian text. As there were none with the declared morphology, or with a fuzzy search. The system quite adequately found the necessary documents both by a simple query in one word and by using a couple of paragraphs or a document as a key phrase.
Official site:
Distribution size: 23 Mb
Based on the name, you can guess that there is support for the Russian language in this program. This is already nice. As for the interface, in general, it is somewhat unusual, but it looks very attractive. Convenience is another matter. This is a very controversial criterion, but still, probably, a multi-window solution is not the best option (a request is entered in one window, the result is displayed in another, and the like).
Snoop uses all the same indexes to perform fast searches, but indexing is much slower than other programs. This is very strange, especially given the fact that its capabilities for processing search queries are very weak, which means that the structure of the index is not complicated. Most likely, this is due to unoptimized algorithms. This program turned out to be a clear outsider in indexing and search speeds: the time spent on creating an index is six times longer than that of the same dtSearch and iSYS. Indexing 20 gigabytes of text for the bloodhound took 38 hours and 46 minutes of work. And the created "search area" occupied the same size on the hard disk as the original data with a slight minus - 19 gigabytes.
The snooper can be presented as an alternative to the standard Windows search, it is hardly capable of more. The fact that the primary task of the Bloodhound is the simplest file search indicates not only a small number of functions for analyzing the text of search queries and an advanced search by file attributes, but even a results window that displays direct links to the files found, as well as to the folders containing these files. The results window is not very informative in the sense that you can read the entire found file only by running it, that is, it does not have a built-in file viewer. But an excerpt from the file where the search word was found is displayed, in general, such a display scheme is very similar to Internet search engines.
Speaking about specific possibilities for processing search queries, it is worth noting that there is no such thing as "search for text", the maximum that you can search for is a phrase, if only because there is no multi-line text input field. Nevertheless, it is possible to analyze the entered phrase and the Snooper offers us a standard search set here: logical operations, mask search and quotation search ... not a lot. The program contains some rudiments of morphological search, but, probably, it is so crude that it rather interferes with correct work (during the tests, many overlaps with incorrect use of morphology were noticed).
But the program allows you to specify when searching for file attributes (document date, file name, folder name), and in these queries you can also use the same search set. Also, you can search for letters by specifying parameters (From, Subject .... etc.).
So, we figured out with the search itself, what else is interesting about the program, for which it received so many awards, according to information from the official website? It is difficult to say what is so special about it, most likely, the Snooper's interface disposes to itself (just outwardly, not to mention usability).
Index operations are fairly standard, but the nice thing is the ability to update the indexes on a schedule. In addition, indexes can also be used on the web. From now on, more details are needed.
Despite the primitiveness of search queries, the program can be used to find files, so its use can be justified in networks. Albeit with a big stretch, since in a large network, the priority task is to quickly search for data using complex search queries due to the huge amount of information - but there are clearly problems with the speed of the search and the program. I must say that the Bloodhound's work with the network is well thought out. A separate application is specially designed for this - Snoop Server. It works in the same way as a simple Snooper (they have one search engine), only for documents located on a central server or on shared resources on the corporate network. Snoop Server creates new indexes on shared resources, or uses previously created ones. Any user on the corporate network can connect to the Snoop Server and use it to access any document (located in the current index) using an Internet browser. Agree, this scheme is extremely convenient: it turns out that files on your own network can be searched in the same way as information on the Internet through, for example, Google.
Evaluating all the advantages and disadvantages of this program, the conclusion suggests itself that for corporate networks its capabilities, most likely, will not be enough (despite even a good organization of work with the network), but for a home computer or even for a home network it is, in principle , it might come up. Although neither the speed of work, nor the search capabilities are encouraging ...
Official website in Russian:
Distribution size: 6 Mb Google Desktop Search + GDS Enterprise

Of course, we could not ignore such an eminent developer. The name Google already says a lot. The people who have been using the most powerful Internet search engine for years will surely, without a single doubt, decide to install this particular search engine on their computer. Think about it: Google on your home computer! However, without succumbing to provocations with a widely promoted brand, let's try soberly, and most importantly objectively, to consider the possibilities of a "desktop" search engine from Google.
The first thing that catches your eye is the absence of its own shell for the program. Google Desktop Search is still in the browser window, therefore, the entire interface of the desktop version got the software from the older Internet brother. Good or bad is a controversial issue: someone likes the minimalism in the design of this search engine, but someone wants to see a full-fledged application filled with all kinds of buttons and so on.
What catches your eye immediately after the design? And the fact that this same Google Desktop Search begins to index everything on your computer, without any demand! And what's most interesting is that it is impossible to select indexing paths using Google Desktop Search. You will have to download a separate program (TweakGDS), which will allow you to slightly expand the settings of Google Desktop, including specifying the places necessary for indexing. Although, until you figure it out, it will already index the standard hard drive, so this setting is needed rather when working with large amounts of data, which is very important when used in corporate networks (Enterprise version). However, it is not a fact that after downloading TweakGDS, your problems will be solved. It needs Microsoft .NET Framework and Microsoft Scripting Runtime to work. Yeah ... the installation, as well as access to the settings, could have been made easier, although perhaps the developers can understand: why write something new, when there is a ready-made search engine, ported it to the local computer and let the user "enjoy" , and a well-known name will make "this" another masterpiece. Come on, let's finish the lyrical digression here and move on to the search.
As for the analysis of search queries and the issuance of results, everything here is absolutely identical to Google on the Internet: the same system for displaying results, the same standard set of logical operations for search queries. In general, Google Desktop Search, like the previous program, is intended solely for finding files - of course, it does not have an internal viewer for these files. The number of file formats supported by Google Desktop Search is quite enough, and it's also nice that it searches the visited Internet pages, taking data from the cache. Search and indexing speeds are quite acceptable. True, for home use. Google Desktop Search coped with an impressive 20 gigabytes of text in 8 hours and 17 minutes. To spend a few days processing information from the corporate network of a large enterprise does not smile at any system administrator. On the plus side: the size of the created index turned out to be at the level (4.5 GB) with another search engine tested in this review - SearchInform.
The big advantage (or overlook) of Google Desktop Search is that it supports plugins that can make a difference. Another thing is that connecting plug-ins and configuring them complicates the task of installing a search engine so much that you start to wonder whether all this is necessary when you can install a normal, full-fledged program in which everything will already be present. After all, to use each feature, you will have to install a new plugin. Even for the program to be able to fully work with archives, a separate gadget is needed. The freeness of all these additional modules fascinates and seduces. However, if you do not take into account the desktop version of the search engine, then competently setting up GDS Enterprise may not be within your power - after all, it is not for nothing that experts from Google offer their services for setting up their own software for your network for only $ 10,000.
If you still master the setup and installation procedure (or pay $ 10,000 to the rapid response team from the Google office), then you will understand that the complexity of the installation is more than compensated for by very flexible settings when used in corporate networks. An important aspect of the work of Google Desktop in a corporate network is the use of group policies, which makes it possible to set the settings for each user.
To summarize, it should be said that the most reasonable application for this program is a home or work computer. Indeed, for an ordinary computer, it is enough to simply install the program - it will do the rest itself (it will not even ask you about anything).
Nevertheless, Google Desktop Search Enterprise will be acceptable in cases where there is an urgent need for flexible configuration of network policy for using a search engine, while the ability to process search queries will be in second place, and the time (or money) spent on setting up the program will be in the first place. location.
Official site:
Distribution kit size with TweakGDS: 1.2 Mb Copernic Desktop Search

Click on the picture to enlarge
The interface of the program evokes extremely positive emotions - everything is done in accordance with generally accepted standards, nothing superfluous, in a word, a pleasant design. It will be very easy for a beginner to understand the Copernic Desktop Search interface. Although, it is somewhat embarrassing that the designers clearly created the program interface, taking into account the fact that the program will work in the standard Windows XP theme. When using the classic theme, the program does not look so pretty anymore. But this is more a matter of taste.
At the first start, the program offers to create indexes for search. It seemed somewhat unusual that after selecting folders for indexing, the program does not offer to press any button, like "Start Indexing", and indexing does not start automatically, only then it was noticed that Copernic was trying to start indexing while the computer was idle. You will have to dig a little in the program options to set everything up properly. It should be noted that there are quite ample opportunities for configuring automatic index creation: built-in scheduler, indexing during computer idle time, in the background, with low priority. Indexing was not very fast - 10 hours 51 minutes - this is slower than in other search engines (except for the Snooper, Copernic is still an order of magnitude faster than iSleuthHound Technologies' development.
Now about the structure of the index. In general, there is nothing special about it. There is a choice of file types, both in a generalized form and in a detailed one. That is, initially you can choose what you want to index - Documents, Images, Videos, Music. On the other tab of the options window, it will be possible to select specific file types by extension. Additionally, you can configure the index so that, for example, images less than 16x16 in size are not indexed or sound files less than 10 seconds long are not indexed. In addition to indexing files from folders, Copernic can work with emails and contacts from the address book of Microsoft Outlook and Microsoft Outlook Express, it is possible to index Favorites and History from Internet Explorer.
Search capabilities are weak here. During the tests, it was even revealed that the program does not search for documents in txt and html formats in Russian, allowing you to find them only by headings, and by no means by content. The only thing that the program provides to improve the search efficiency is the use of a standard set of logical operations, and even then, this possibility was discovered experimentally, since it was not documented. By the way, the program's help is also not all right - it is available only via the Internet, which, you see, is very inconvenient, and there is not too much help information on the network. Apparently, the developers decided that the simple interface of the program does not imply the presence of normal help. Continuing the conversation about search capabilities, it should be noted that, despite the weak analysis of queries, the program provides an interesting search system - the user can select the type of files (images, videos, music, etc.), enter a search query and select the attributes inherent in the selected file type. For example, for sound files, these can be values from mp3 tags (artist, album, date, etc.), for images, for example, you can choose their size (by resolution), in general, each type has its own settings. After searching for a specific file type, the program will display a very informative list in the results window, and if your request includes files of other types, you can open them by clicking on a specific link.
Separately, it is worth mentioning the results display window. The contents of these files are displayed below the list of found files (a similar scheme is often used in mail clients). True, the text can be viewed only in its native format, and there is no plain text display mode, which is not always convenient, since opening a document in this case takes more time. But, given that Copernic is able to search for images and music, it is possible to view these multimedia files.
The basic principles of this program are described, now let's see what Copernic Desktop Search can offer us for working with the network ... In principle, you can watch for a very long time, but you will hardly be able to see anything. In other words, this program was not meant to be networked. Copernic Desktop Search is exclusively a home search engine.
Obviously, the only (most logical) application of this program is a home computer. Here it will quite cope with all the simple search queries of users consisting of one or two words, find the necessary information, and the separation of search by file type and support for multimedia files, together with background indexing in low priority mode, coupled with a pleasant interface, only give the program the strength to gain trust. among inexperienced users.
Official site
Distribution size: 2.6 MbISYS Desktop

Click on the picture to enlarge
A very powerful program. In terms of the level of equipment with all sorts of functions, it is somewhere near the next search engine in the list SearchInform. In this case, the size of the installation file is more than 40Mb! It’s hard to say what could fit into such sizes, because the same SearchInform, with similar functionality, takes 15Mb.
The installation process here is also not very pleasant, or rather not even the installation process. Before downloading the program, you will be asked to register, otherwise you will not. Next, the interface. It is made very nicely, nothing superfluous catches the eye, however - these are the impressions of a person who is already somewhat accustomed to it. It will not be easy for a beginner to figure out where and what is located, where to click and where to finally search. It is highly recommended to read the help before starting work - you will save a lot of nerves and time. Added to everything else is the complete lack of support for the Russian language in the program. Not good. In addition, the windows here are not overloaded with controls, but the price paid for this was the multi-modularity and the use of additional windows. For example, search queries are entered using the launch of one program, and indexes are managed using a different program. Search queries are also entered here in separate, appearing windows. Which is better - a congested interface or ubiquitous multiple windows - it's hard to say, rather, it's a matter of taste.
With regard to creating indexes, the program provides features to simplify the process of setting options for a new index. These features include several ready-made templates for creating indexes for the folder "My Documents", "Mail", "Mail and Documents", "Specific folder", "Folder with a choice of file types", etc. These templates simplify the creation of indexes on the first stage. The utility for working with indexes has a not very good interface that scares off some complexity (this is a very subjective assessment, to be honest), however, if you look at it, it provides many useful options and in general it is not difficult to use it. ISYS Desktop is able to index data from various data sources, and also provides many flexible settings for such indexing. Additional indexing features include: support for SQL, FTP, TRIM Context, WORLDOX 2002, scripts. When creating an index, if you selected the "Folder with a choice of file types" item, you have the opportunity to select the file types for indexing manually (by extension). I must say that the supported file types are simply a huge number, however, it will not be possible to add your own type (extension) to the existing list. You can also note the presence of an indexing planner. ISYS Desktop took 6 hours and 13 minutes to create an index and process 20 gigabytes of information, eventually showing a good time and the size of the created file - 7.9 GB.
The search capabilities of this program are quite good. The one used in ISYS is much more powerful than the usual support for logical operations. Of the advanced search capabilities, the program offers the use of synonyms, a sort filter (by path, name and file creation date). The set of logical operators is somewhat wider than the standard set. In addition to logical operations, the program allows you to work with many other operators, which, in principle, are capable of replacing some types of search, for example, parsing search can be completely replaced by using special operators. I was very surprised that the program does not have a search using morphology. This is a serious oversight, as the search efficiency is greatly enhanced by the use of morphological analysis. In addition, there is no list of meaningful words, but there is an extensive list of meaningless words. Search functions such as "approximate search" and "heuristic analysis" are also announced.
ISYS provides a choice of several types of search queries, namely, types - visual. This is done using different types of windows for entering search queries, however, in fact, no window allows the use of technologies other than those listed above.
Search results are very informative, displayed as a list of documents sorted by relevance. Below is a preview of the selected document. Unlike Copernic Desktop Search, preview is available only in plain text, it was not possible to display documents in their native format, be it Word, Html or PDF, although this is not too critical in principle. The program allows you to split the found documents into groups according to certain criteria (by default, they are divided by relevance). You can also view already found documents by selecting individual folders (this is convenient when the result is a very large number of documents).
The use of the program in a corporate network is also very justified, since it provides good opportunities for organizing a network search. The search system is based on the creation of a public index that contains indexed data from public network resources.
In fact, the program from ISYS is worthy of attention, at least familiarization with it. This program is a mature project with a huge number of functions (not always and not for everyone, of course, they are needed, but still). The chances that the program will have some improvements in terms of processing search queries are not known, but at the moment it can be recommended for almost universal use. And given that it is still too heavy for home systems, the main places of its installation are corporate networks.
Official site:
Distribution size: 40 MbSearchInform

Click on the picture to enlarge
You probably shouldn't start with a description of the SearchInform interface right away. First, you should describe the installation process, or rather one of its details: you cannot install the program without an Internet connection. The fact is that before the first launch, the program requires user registration (free) and sends all the entered data to the server. Apparently, the developers had to take such measures in the fight against piracy, but this did not have a positive effect on the ease of installation.
The program interface is made in compliance with all generally accepted rules, however, at first glance, it is somewhat cumbersome. Using the program for the first time, it seems that it is too complicated, sometimes it is not easy to remember in which menu or on which tab the desired option is located, however, with longer use, the interface no longer seems so terribly complicated. The main thing is to read the help first.
With a little understanding of the interface, you can start creating the index. The process itself is very simple and the indexing speed, even by eye, is much higher than all other search engines from the review. The clear test numbers show that SearchInform has twice surpassed dtSearch and iSYS in indexing speed! The program indexed the provided data in the amount of 20 gigabytes in a record time - 3 hours and 17 minutes. And the size of the created index turned out to be the smallest 4.4 GB - 100 MB less than that of Google Desktop Search.
The program supports, in addition to regular files and folders, indexing of e-mails, connection and indexing of databases (!) And other external sources (DMS, CRM), immediately during indexing, you can specify a dictionary for morphological search, and all attributes can be indexed files. After creating the index, when you try to conduct the first test search for documents, you can get a little confused: "there are two types of search, but which one do I need?". As mentioned earlier - the main thing is to read the help, then everything will become clear. The program really knows how to carry out two types of search - this is a phrase search and a search for documents similar in content to the query text.
A description of all the main functions for analyzing a search query was given above, so now we will only list the search capabilities provided by this program. Let's start with phrasal search: of course, morphological search, quotation search, logical operations, word parsing search (search at the beginning of a word, at the end, in the middle part, or a complete match), mixed quotation search (when all words from the query must be present in the document, but not necessarily in the entered order), error correction search, use of synonyms, "almost citation search" (searches for the entered phrase as a quotation, but other words may be present between the entered words), etc. Some of the listed options have their own specific settings. In addition, it is possible to use a dictionary of insignificant words, and the program already has a ready-made list of these words, and you can also use the dictionary of priority words to search (you will have to fill it in yourself, of course).
Here, in principle, we briefly ran through all the basic possibilities of phrase search.
Let's move on to considering the features of this program - the search for similar documents. The developers argue that this is by no means a simple search for text, this is precisely a "search for similar" - that is how it is described by them everywhere, but okay, you can call it whatever you like - the main point. A quick search on the Internet can quickly reveal that so-called "similar searches" are a new development in the field of text analysis. This system allows you to find texts that are similar in terms of their semantic content. The most pleasant thing was that after conducting test searches, it turned out that theory is quite the same as practice! The program actually searches for documents similar in content and displays them in a list, sorted by similarity percentage.
Next, let's consider what SearchInform offers (in particular, its corporate version SearchInform Corporate) for working in a corporate network. There are two types of applications: back-end and user-side. The server side processes the specified indexes on its own, and users can use them for searching, depending on the access rights assigned to them. Users can be configured automatically using Windows accounts (in a professional language, SearchInform uses Windows NTFS authentication) or manually (users will have to be added separately). Each user can be allowed or denied access to certain indexes, you can also combine users into groups. In general, SearchInform's network settings are ahead of Google in terms of flexibility, and Snoop Server in terms of convenience and simplicity.
Official site:
Distribution size: 14.7 Mb Indexing speed comparison
Search engine Indexing time Index size
Snoop Prof Deluxe 4.5 38 hours 46 minutes 19 GB
Isys Desktop 7.0 6 hours 13 minutes 7.9 GB
DtSearch 7.0 6 hours 3 minutes 8.6 GB
Google Desktop Search Enterprise 8 hours 17 minutes 4.5 GB
Copernic Desktop Search * 10 hours 51 minutes 7 GB
SearchInform 1.5.02 3 hours 17 minutes 4.4 GB
* Most of the documents .html and .txt containing Russian text, although they were indexed, but except for their names, it was impossible to find them.
All programs are worthy of attention.
Based on the tests and careful examination of each program presented in the review, certain conclusions can be drawn. So, Google Desktop Search Copernic Desktop Search is quite suitable for an inexperienced user as a home information search system. They do a good job with simple requests, do not overload the user with settings, and, moreover, are completely free. Google's attempt to enter the market of corporate search engines is not yet highly justified: for full-fledged operation, the program needs to be hung with additional modules, and it is far from easy to configure. Therefore, the speaking names Desktop Search, that Copernic, that Google leave behind them the niche of "desktop" search engines.
True, more powerful solutions - dtSearch, iSYS and SearchInform are also not baked and offer users their "desktop" versions. But at a reasonable price, unlike free software from Google and Copernic. Of course, you have to pay for power, speed and functionality. But the main aim of the developers of dtSearch, iSYS and SearchInform is, of course, on the corporate sector. Networking, functionality, speed of indexing and search are what differentiate these products from their "competitors". According to the test results, the favorite was determined - SearchInform. The program provides the ability to search for similar documents, has the highest indexing and search speed, has a good set of functions.


The address	Description
www.excite.com	Search engine with site reviews and guides
www.alta-vista.com	Search server, advanced search capabilities available
www.hotbot.com	Search Server
www.ifoseek.com	Search server (easy to use)
www.ipl.org	Internet Publik library, a public library operated by the World Village project
www.wisewire.com	WiseWire - Artificial Intelligence Search
www.webcrawler.com	WebCrawler - search engine, easy to use
www.yahoo.com	Web directory and interface for accessing full-text search on the AltaVista server
www.aport.ru	Aport - Russian-language search server
www.yandex.ru	Yandex - Russian-language search server
www.rambler.ru	Rambler - Russian-language search server
Online Help Resources
www.yellow.com	Internet Yellow Pages
monk.newmail.ru	Search engines of various profiles
www.top200.ru	Top 200 Web Sites
www.allru.net
www.ru	Catalog of Russian Internet resources
www.allru.net/z09.htm	Educational resources
www.students.ru	Server of Russian students
www.cdo.ru/index_new.asp	Distance Learning Center
www.open.ac.uk	Open University of Great Britain
www.ntu.edu	US National University
www.translate.ru	Electronic text translator
www.pomorsu.ru/guide.library.html	List of links to network libraries
www.elibrary.ru	Scientific electronic library
www.citforum.ru	E-library
www.infamed.com/psy	Psychological tests
www.pokoleniye.ru	Internet Education Federation website
www.metod.narod.ru	Educational resources
www.spb.osi.ru/ic/distant	Distance learning on the Internet
www.examen.ru	Exams and tests
www.kbsu.ru/~book/	Computer science textbook
Mega.km.ru	Encyclopedias and dictionaries

Search engine	Indexing time	Index size
Snoop Prof Deluxe 4.5	38 hours 46 minutes	19 GB
Isys Desktop 7.0	6 hours 13 minutes	7.9 GB
DtSearch 7.0	6 hours 3 minutes	8.6 GB
Google Desktop Search Enterprise	8 hours 17 minutes	4.5 GB
Copernic Desktop Search *	10 hours 51 minutes	7 GB
SearchInform 1.5.02	3 hours 17 minutes	4.4 GB

Review of programs for searching documents and data. Professional search for information on the Internet Professional search for information on the Internet

What is it

The better

Why do you need it

What is it

The better

Why do you need it

3. YaCy

What is it

The better

Why do you need it

4. Pipl

What is it

The better

Why do you need it

What is it

The better

Why do you need it

What is it

The better

Why do you need it

What is it

The better

Why do you need it

What is it

The better

Why do you need it

Finally

Top related articles