Prohibiting indexing robots txt. Hiding links with scripts

03.05.2019 Safety

The purpose of this guide is to help webmasters and administrators use robots.txt.

Introduction

The Robot Exception Standard is inherently very simple. In short, it works like this:

When a compliant robot visits a site, it first requests a file called “/robots.txt”. If such a file is found, the Robot looks for instructions in it that prohibit indexing some parts of the site.

Where to put your robots.txt file

The robot simply requests the URL "/robots.txt" on your site, the site in in this case Is a specific host on a specific port.

Site URL	Url robots file.txt
http://www.w3.org/	http://www.w3.org/robots.txt
http://www.w3.org:80/	http://www.w3.org:80/robots.txt
http://www.w3.org:1234/	http://www.w3.org:1234/robots.txt
http://w3.org/	http://w3.org/robots.txt

There can be only one "/robots.txt" file on a site. For example, you shouldn't put your robots.txt file in custom subdirectories - anyway, robots won't look for them there. If you want to be able to create robots.txt files in subdirectories, then you need a way to programmatically collect them into a single robots.txt file located at the root of your site. You can use instead.

Remember that URLs are case sensitive and the file name “/robots.txt” must be written entirely in lowercase.

Incorrect robots.txt location
http://www.w3.org/admin/robots.txt
http://www.w3.org/~timbl/robots.txt	The file is not in the root of the site
ftp://ftp.w3.com/robots.txt	Robots don't index ftp
http://www.w3.org/Robots.txt	File name is not lowercase

As you can see, the robots.txt file should be placed exclusively in the site root.

What to write to your robots.txt file

The robots.txt file is usually written something like:

User-agent: *
Disallow: / cgi-bin /
Disallow: / tmp /
Disallow: / ~ joe /

In this example, indexing of three directories is prohibited.

Note that each directory is listed on a separate line - you cannot write "Disallow: / cgi-bin / / tmp /". You also cannot split one Disallow or User-agent statement into several lines, because line break is used to separate instructions from each other.

Regular expressions and wildcards cannot be used either. An asterisk (*) in a User-agent statement means any robot. Instructions like "Disallow: * .gif" or "User-agent: Ya *" are not supported.

The specific instructions in robots.txt depend on your site and what you want to block from indexing. Here are some examples:

Deny the entire site from being indexed by all robots

User-agent: *
Disallow: /

Allow all robots to index the entire site

User-agent: *
Disallow:

Or you can just create empty file"/Robots.txt".

Close only a few directories from indexing

User-agent: *
Disallow: / cgi-bin /
Disallow: / tmp /
Disallow: / private /

Disallow site indexing for only one robot

User-agent: BadBot
Disallow: /

Allow site indexing for one robot and prohibit all others

User-agent: Yandex
Disallow:

User-agent: *
Disallow: /

Deny indexing all files except one

This is not easy because there is no “Allow” statement. Instead, you can move all files except for the one you want to allow for indexing into a subdirectory and prohibit its indexing:

User-agent: *
Disallow: / docs /

Or you can deny all files prohibited for indexing:

User-agent: *
Disallow: /private.html
Disallow: /foo.html
Disallow: /bar.html

Any page on the site can be opened or closed for indexing by search engines. If the page is open, the search engine adds it to its index, if it is closed, then the robot does not visit it and does not take into account search results.

When creating a site, it is important at the programmatic level to close all pages from indexing that, for whatever reason, should not be seen by users and search engines.

These pages include the administrative part of the site (admin panel), pages with various service information (for example, with personal data of registered users), pages with multi-level forms (for example, complex shapes registration), forms feedback etc.

Example:
User profile on the forum about search engines ah Searchengines.

It is also mandatory to close the pages from indexing, the content of which is already used on other pages. Such pages are called duplicate. Full or partial duplicates greatly pessimize the site, as they increase the amount of non-unique content on the site.

As you can see, the content on both pages overlaps. Therefore, category pages on WordPress sites are closed from indexing, or only the name of the posts is displayed on them.

The same goes for tag pages - such pages are often found in the structure of WordPress blogs. The tag cloud makes it easier to navigate the site and allows users to quickly find information of interest. However, they are partial duplicates of other pages, which means they must be closed from indexing.

Another example is a store on CMS OpenCart.

Product category page http://www.masternet-instrument.ru/Lampy-energosberegajuschie-c-906_910_947.html.

The page of products covered by the discount http://www.masternet-instrument.ru/specials.php.

These pages have similar content, as they contain many of the same products.

Especially critical for duplicate content on different pages site belongs to Google. Per a large number of duplicates in Google, you can earn certain sanctions up to the temporary exclusion of the site from the search results.

Another case when the content of pages should not be "shown" to a search engine is pages with non-unique content. A typical example is the instructions for medicines in an online pharmacy. The content on the page with the description of the drug http://www.piluli.ru/product271593/product_info.html is not unique and has been published on hundreds of other sites.

It is almost impossible to make it unique, since rewriting such specific texts is a thankless and forbidden task. The best solution in this case, the page will be closed from indexing, or a letter will be written to search engines with a request to be loyal to the non-unique content of the content, which cannot be made unique for one reason or another.

How to block pages from indexing

The classic tool for closing pages from indexing is the robots.txt file. It is located in the root directory of your site and is created specifically to show search robots which pages they should not visit. This is normal text file, which you can edit at any time. If you don't have a robots.txt file, or if it's empty, search engines will index all pages they find by default.

The structure of the robots.txt file is pretty simple. It can consist of one or several blocks (instructions). Each instruction, in turn, consists of two lines. The first line is called User-agent and defines which search engine should follow this instruction. If you want to disable indexing for all search engines, the first line should look like this:

If you want to prohibit page indexing for only one search engine, for example, for Yandex, the first line looks like this:

The second line of the instruction is called Disallow. To ban all pages on the site, write the following on this line:

To enable indexing of all pages, the second line should look like this:

In the Disallow line, you can specify specific folders and files to be closed from indexing.

For example, to prohibit indexing of the images folder and all of its contents, write:

To "hide" specific files from search engines, we list them:

User-agent: *
Disallow: /myfile1.htm
Disallow: /myfile2.htm
Disallow: /myfile3.htm

These are the basic principles of robots.txt file structure. They will help you close individual pages and folders on your site from indexing.

Another, less common way of prohibiting indexing is the Robots meta tag. If you want to close a page from indexing or prevent search engines from indexing links placed on it, you need to write this tag in its HTML code. It must be placed in the HEAD area, before the tag .</p> <p>The Robots meta tag has two parameters. INDEX is a parameter responsible for indexing the page itself, and FOLLOW is a parameter that allows or prohibits indexing of links located on this page.</p> <p>To prohibit indexing, instead of INDEX and FOLLOW, write NOINDEX and NOFOLLOW, respectively.</p> <p>Thus, if you want to close the page from indexing and prevent search engines from considering links on it, you need to add the following line to your code:</p> <blockquote><p><meta name=“robots” content=“noindex,nofollow”></p> </blockquote> <p>If you do not want to hide the page from indexing, but you need to “hide” links on it, the Robots meta tag will look like this:</p> <blockquote><p><metaname=“robots” content=“index,nofollow”></p> </blockquote> <p>If, on the contrary, you need to hide the page from the search engine, but at the same time take into account the links, this tag will look like this:</p> <blockquote><p><meta name=“robots” content=“noindex,follow”></p> </blockquote> <p>Most modern CMS provide the ability to close some pages from indexing directly from the site's admin panel. This avoids the need to understand the code and manually configure these parameters. However, the methods listed above have been and remain the most universal and most reliable tools for prohibiting indexing.</p> <p>The technical aspects of the created site play no less <a href="https://bumotors.ru/en/v-obektno-orientirovannyh-yazykah-programmirovaniya-peremennye-igrayut.html">important role</a> for website promotion in search engines than its content. One of the most important technical aspects is site indexing, that is, determining the areas of the site (files and directories) that may or may not be indexed by search engine robots. For these purposes, robots.txt is used - this is <a href="https://bumotors.ru/en/kak-udalit-zablokirovannye-faily-s-kompyutera-programmy-dlya-udaleniya-ne.html">special file</a> which contains commands for search engine robots. <a href="https://bumotors.ru/en/pravilnaya-nastroika-faila-podkachki-dlya-windows-7-fail-podkachki-kakoi-razmer.html">Correct file</a> robots.txt for Yandex and Google will help to avoid many unpleasant consequences associated with site indexing.</p><h3><b>2. The concept of the robots.txt file and the requirements for it</b></h3><p>The /robots.txt file is intended to instruct all spiders to index <a href="https://bumotors.ru/en/spisok-informacionnyh-baz-pust-chto-delat-oshibki-pri-sozdanii-bazy.html">information servers</a> as defined in this file, i.e. only those directories and server files that are not described in /robots.txt. This file must contain 0 or more records that are associated with one or another robot (as determined by the value of the agent_id field) and indicate for each robot or for all at once what exactly they do not need to be indexed.</p><p>The syntax of the file allows you to set forbidden indexing areas, both for all and for certain robots.</p><p>There are special requirements for the robots.txt file, failure to comply with which may lead to incorrect reading of the search engine by the robot or even to incapacity. <a href="https://bumotors.ru/en/otkrytie-faila-vvod-dannyh-iz-faila-i-vyvod-v-fail.html">of this file</a>.</p><p>Primary requirements:</p><ul><li>all letters in the file name must be uppercase, that is, must be lowercase:</li><li>robots.txt - correct,</li><li>Robots.txt or ROBOTS.TXT is wrong;</li><li>the robots.txt file must be generated in <a href="https://bumotors.ru/en/csv-fail-konvertirovat-v-xls-import-i-eksport-tekstovyh-failov-v.html">text format</a> Unix. When copying this file to the site, the ftp client must be configured to <a href="https://bumotors.ru/en/kak-ubrat-tekstovyi-rezhim-na-windows-7-vklyuchenie-i-vyklyuchenie-testovogo.html">text mode</a> file sharing;</li><li>the robots.txt file must be located in the root directory of the site.</li> </ul><h3><b>3. Content of the robots.txt file</b></h3><p>The robots.txt file includes two entries: "User-agent" and "Disallow". The names of these records are not case sensitive.</p><p>Some search engines also support <a href="https://bumotors.ru/en/kak-zapisat-razgovor-na-honor-7-video-zapis-razgovorov-na-androide.html">additional entries</a>... For example, the Yandex search engine uses the Host record to determine the main mirror of the site (the main mirror of the site is the site that is in the index of search engines).</p><p>Each entry has its own purpose and can be encountered several times, depending on the number of pages and / or directories to be closed from indexing and the number of robots you are accessing.</p><p>Supposed <a href="https://bumotors.ru/en/faily-sozdannye-v-prilozhenii-access-imeyut-rasshirenie-kakoi-format-faila.html">following format</a> robots.txt file lines:</p><p><b>entry_name</b>[optional</p><p>spaces] <b>: </b>[optional</p><p>spaces] <b>meaning</b>[optional spaces]</p><p>For a robots.txt file to be considered valid, at least one "Disallow" directive must be present after each "User-agent" entry.</p><p>A completely empty robots.txt file is equivalent to no robots.txt, which assumes that the entire site is allowed to be indexed.</p><h4><b>User-agent entry</b></h4><p>The "User-agent" record must contain the name of the search robot. In this entry, you can tell each specific robot which pages of the site to index and which not.</p><p>An example of a "User-agent" record, where the call is made to all search engines without exceptions and the "*" symbol is used:</p><p>An example of a "User-agent" record, where the call is made only to the robot of the Rambler search engine:</p><p>User-agent: StackRambler</p><p>Each search engine's robot has its own name. There are two main ways to recognize it (name):</p><p>on the sites of many search engines there is a specialized section "help to the webmaster", in which the name of the search robot is often indicated;</p><p>When looking at the logs of a web server, in particular when looking at hits to the § robots.txt file, you can see a lot of names in which the names of search engines or part of them are present. Therefore, you just have to choose the desired name and enter it into the robots.txt file.</p><h4><b>Disallow recording</b></h4><p>The "Disallow" record must contain instructions that indicate to the search robot from the "User-agent" record which files and / or directories are prohibited from indexing.</p><p>Consider <a href="https://bumotors.ru/en/chto-takoe-border-v-css-primery-s-razlichnymi-granicami-ramok-css-border.html">various examples</a> Disallow records.</p><p>An example of a robots.txt entry (allow everything for indexing):</p><p><b>Disallow:</b></p><p>Example (the site is completely prohibited to. For this, use the "/" symbol): Disallow: /</p><p>Example (the file "page.htm" located in the root directory and the file "page2.htm" located in the directory "dir" are prohibited for indexing):</p><p><b>Disallow: /page.htm</b></p><p><b>Disallow: /dir/page2.htm</b></p><p>Example (the directories "cgi-bin" and "forum" and, therefore, the entire contents of this directory are prohibited for indexing):</p><p><b>Disallow: / cgi-bin /</b></p><p><b>Disallow: / forum /</b></p><p>It is possible to close from indexing a number of documents and (or) directories starting with the same characters using only one "Disallow" entry. To do this, you need to write the initial identical characters without a closing slash.</p><p>Example (the directory "dir" is prohibited for indexing, as well as all files and directories starting with the letters "dir", that is, files: "dir.htm", "direct.htm", directories: "dir", "directory1 "," Directory2 ", etc.):</p><h4><b>Allow entry</b></h4><p>The "Allow" option is used to indicate exclusions from non-indexed directories and pages that are specified by the "Disallow" entry.</p><p>For example, there is an entry that looks like this:</p><p>Disallow: / forum /</p><p>But at the same time, page1 needs to be indexed in the / forum / directory. Then you need the following lines in your robots.txt file:</p><p>Disallow: / forum /</p><p>Allow: / forum / page1</p><h4><b>Sitemap record</b></h4><p>This entry points to the location of the sitemap in <a href="https://bumotors.ru/en/xml-format-chem-otkryt-i-redaktirovat-kak-izmenit-fail.html">xml format</a> which is used by search robots. This entry indicates the path to this file.</p><p>Sitemap: http://site.ru/sitemap.xml</p><h4><b>Host record</b></h4><p>The "host" entry is used by the Yandex search engine. It is necessary to determine the main mirror of the site, that is, if the site has mirrors (a mirror is a partial or <a href="https://bumotors.ru/en/kak-obezopasit-svoi-smartfon-i-sdelat-bekap-proshivki.html">full copy</a> site. The presence of duplicate resources is sometimes necessary for the owners of highly visited sites to increase the reliability and availability of their service), then using the "Host" directive, you can select the name under which you want to be indexed. Otherwise, Yandex will choose the main mirror on its own, and the rest of the names will be prohibited from indexing.</p><p>For compatibility with crawlers that do not accept the Host directive when processing a robots.txt file, add the "Host" entry immediately after the Disallow entries.</p><p>Example: www.site.ru - main mirror:</p><p><b>Host: www.site.ru</b></p><h4><b>Crawl-delay entry</b></h4><p>This entry is perceived by Yandex. It is a command for the robot to make intervals of a specified time (in seconds) between indexing pages. Sometimes it is necessary to protect the site from overloads.</p><p>So, the following entry means that the Yandex robot needs to go from one page to another no earlier than 3 seconds later:</p><h4><b>Comments (1)</b></h4><p>Any line in robots.txt that starts with a "#" character is considered a comment. It is allowed to use comments at the end of lines with directives, but some robots may not recognize this line correctly.</p><p>Example (the comment is on the same line along with the directive):</p><p><b>Disallow: / cgi-bin / # comment</b></p><p>It is advisable to place the comment on a separate line. White space at the beginning of a line is permitted but not recommended.</p><h3><b>4. Sample robots.txt files</b></h3><p>Example (comment is on a separate line): <br><b>Disallow: / cgi-bin / # comment</b></p><p>An example of a robots.txt file that allows all robots to index the entire site:</p><p>Host: www.site.ru</p><p>An example of a robots.txt file that prohibits all robots from indexing a site:</p><p>Host: www.site.ru</p><p>An example of a robots.txt file that prohibits all robots from indexing the "abc" directory, as well as all directories and files starting with the "abc" characters.</p><p>Host: www.site.ru</p><p>An example of a robots.txt file that prohibits indexing of the page "page.htm" located in the root directory of the site by the search robot "googlebot":</p><p>User-agent: googlebot</p><p>Disallow: /page.htm</p><p>Host: www.site.ru</p><p>An example of a robots.txt file that disallows indexing:</p><p>- for the robot "googlebot" - the page "page1.htm" located in the directory "directory";</p><p>- for the Yandex robot - all directories and pages starting with the symbols "dir" (/ dir /, / direct /, dir.htm, direction.htm, etc.) and located in the root directory of the site.</p><p>User-agent: googlebot</p><p>Disallow: /directory/page1.htm</p><p>User-agent: Yandex</p><h3>5. Errors related to the robots.txt file</h3><p>One of the most common mistakes is inverted syntax.</p><p><b>Not properly:</b></p><p>Disallow: Yandex</p><p><b>Right:</b></p><p>User-agent: Yandex</p><p><b>Not properly:</b></p><p>Disallow: / dir / / cgi-bin / / forum /</p><p><b>Right:</b></p><p>Disallow: / cgi-bin /</p><p>Disallow: / forum /</p><p>If, when processing a 404 error (document not found), the web server issues a special page, and the robots.txt file is missing, then a situation is possible when the search robot when requesting a robots.txt file is given the same <a href="https://bumotors.ru/en/ne-mogu-zaiti-na-stranicu-ispolzuem-specialnyi-skript-chto-delat-esli-voiti.html">special page</a> which is not an indexing control file in any way.</p><p>Robots.txt case related error. For example, if you need to close the "cgi-bin" directory, then in the "Disallow" entry you cannot write the name of the directory in uppercase "cgi-bin".</p><p><b>Not properly:</b></p><p>Disallow: / CGI-BIN /</p><p><b>Right:</b></p><p>Disallow: / cgi-bin /</p><p>An error related to the absence of an opening slash when closing a directory from indexing.</p><p><b>Not properly:</b></p><p>Disallow: page.HTML</p><p><b>Right:</b></p><p>Disallow: /page.html</p><p>To avoid the most common errors, you can check the robots.txt file using Yandex.Webmaster tools or Tools for <a href="https://bumotors.ru/en/kak-otkryt-gostevoi-dostup-k-yandeks-metrike-poshagovo-kak-otkryt-gostevoi.html">Google webmasters</a>... The check is carried out after downloading the file.</p><h3>6. Conclusion</h3><p>Thus, the presence of a robots.txt file, as well as its compilation, can affect the promotion of a site in search engines. Without knowing the syntax of the robots.txt file, you can prohibit indexing of possible promoted pages, as well as the entire site. And, conversely, the competent compilation of this file can greatly help in promoting the resource, for example, you can close documents from indexing that interfere with the promotion of the desired pages.</p> <p>Want to know how to prevent your site from being indexed in robots.txt and other tools? Then the presented material is just for you.</p> <p>Of course, site owners are fighting to get their resource indexed by search engines as quickly as possible. But there are times when it is necessary to prohibit the indexing of the site so that the search bot does not visit the resource for some time. Such cases may be:</p> <ul><li>recent creation of a site, when there is still no useful information on it;</li> <li>the need for updates (for example, a change in the design of the site);</li> <li>the presence of hidden or secret sections or <a href="https://bumotors.ru/en/chitat-poleznoe-v-kontakte-poleznye-sovety-dlya-doma.html">useful links</a> which I would not like to be passed on to search bots.</li> </ul><p><i><b>You can close the entire site or its individual parts:</b> </i></p> <ul><li>a separate paragraph or link;</li> <li>forms for entering information;</li> <li>admin part;</li> <li>pages <a href="https://bumotors.ru/en/vosstanovlenie-uchetnoi-zapisi-gugl-vyvody-po-vosstanovleniyu-akkaunta-google.html">user profiles</a> and registrations;</li> <li>duplicate pages;</li> <li>tag cloud, etc.</li> </ul><blockquote><p>There are many ways in which you can block a site from indexing. Editing your robots.txt file is one of them. We will consider this method and two more of the most popular and simple ones.</p> </blockquote> <h3>How to close a site from indexing for Yandex, Google and all search engines in robots.txt</h3> <p><i>Editing your robots.txt file is one of the safest and fastest ways</i> set this ban for search engines for a while or forever. What do I need to do:</p> <ol><li>Create a robots.txt file. To do this, you need to create a regular <a href="https://bumotors.ru/en/skachat-programmu-dlya-redaktirovaniya-tekstovyh-dokumentov-luchshie-tekstovye.html">Text Document</a> with the extension .txt, name it "robots".</li> <li>Load the created file into <a href="https://bumotors.ru/en/chto-znachit-koren-papki-kornevaya-direktoriya.html">root folder</a> your blog. If the site is created on <a href="https://bumotors.ru/en/kak-sozdat-svoi-blog-i-zarabotat-ustanovka-i-nastroika-wordpress-kakimi-dvizhkami.html">WordPress engine</a>, then this folder can be found where the wp-includes, wp-content, etc. folders are located.</li> <li>Directly setting the prohibition of indexing for search engines.</li> </ol><p>The prohibition of site indexing can be set both for specific search engines and for all search engines. We'll look at different options.</p> <p><br><img src='https://i1.wp.com/masterproseo.ru/wp-content/uploads/2016/08/robots.jpg' align="center" width="100%" loading=lazy loading=lazy></p><p>To block a site from indexing by Google search bots, you need to write the following in your robots.txt file:</p> <ol><li>User-agent: Googlebot</li> <li>Disallow: /</li> </ol><p><i>In order to check if the site is closed from indexing, an account is created and the desired site is added to Google Webmaster</i>... A check function is provided here.</p> <p>The results will then be shown. If the site is prohibited from indexing, it will say "Blocked by row" and indicate which row is blocking indexing. If any actions to prohibit indexing by Google search bots were performed incorrectly, then the sign "Allowed" will be signed.</p> <blockquote><p>Please note that it is not possible with search engine robots.txt <a href="https://bumotors.ru/en/soobshchenie-o-poiskovoi-sisteme-google-istoriya-google-gugl-kompanii-s-mirovym.html">google system</a> prohibit indexing 100%. This is a kind of recommendation for Google, since it will decide for itself whether to index a separate document or not.</p> </blockquote> <p><b>To block site materials from indexing by Yandex, enter the following in the robots.txt file:</b></p> <ol><li>User-agent: Yandex</li> <li>Disallow: /</li> </ol><p>To check the status of a resource, you need to add it to Yandex Webmaster, where you then need to enter several pages from your site and click the "Check" button. If everything worked out, the line will display the inscription "Forbidden by the rule."</p> <p><i>You can also set a ban on indexing your site for all search engines at the same time. To do this, again, open the robots.txt file and write the following line in it</i>:</p> <ol><li>User-agent: *</li> <li>Disallow: /</li> </ol><blockquote><p>Checking the prohibition of indexing for Google and Yandex is performed according to the above scheme in Google Webmaster and Yandex Webmaster, respectively.</p> </blockquote> <p>To see your robots.txt file, you need to go to yourdomain.com/robots.txt. Everything that has been written will be displayed here. It so happens that a 404 error appears. This indicates that something was done wrong when uploading your file.</p> <p><br><img src='https://i0.wp.com/masterproseo.ru/wp-content/uploads/2016/08/robots-allow-disallow.jpg' align="center" width="100%" loading=lazy loading=lazy></p><h3>Prevent indexing a resource using the toolbar</h3> <p>The method of closing the site from indexing using the toolbar is suitable only for those resources that were made on WordPress.</p> <p>The procedure is simple and quick:</p> <ul><li>Open the "Control Panel", go to "Settings" - "Reading";</li> <li>Check the box "Recommend search engines not to index the site."</li> <li>Save changes.</li> </ul><p><b>Performing these actions is just a recommendation for search engines</b>... And they decide on their own whether or not the resource materials will be indexed. The site's visibility settings even include <a href="https://bumotors.ru/en/sravnitelnye-testy-kompyuterov-specialnoe-testirovanie.html">special string</a>: « <a href="https://bumotors.ru/en/poiskovye-mashiny-poisk-informacii-v-web.html">Search engines</a> they decide for themselves whether to follow your request. " Note that Yandex usually "obeys", and Google can act at its own discretion and still index the site in some cases, despite the recommendation.</p> <h2>Closing a site from indexing manually</h2> <p>In the source code, when the page or the entire resource is closed from indexing, the line appears <meta name=»robots» content=»noindex,follow» /></p> <p>It is this line that tells search engines that there is no need to index individual materials or resources. You can manually write this line anywhere on the site. The main thing is that it is displayed on all pages.</p> <p><i><b>This method is also suitable in order to close any unnecessary document from indexing.</b> </i></p> <p>After completing the updates, you need to check if everything worked out. To do this, you need to open <a href="https://bumotors.ru/en/pochemu-kompyuter-ne-mozhet-prosmotret-kod-elementa-kak-posmotret.html">source</a> via <a href="https://bumotors.ru/en/deistvie-primenyaemoe-po-hokkeyu-ctrl-x-goryachie-klavishi-na-klaviature.html">CTRL keys</a>+ U and see if it contains the correct line. Its presence confirms the successful operation of closing from indexing. Additionally, you can check in Yandex Webmaster and Google Webmaster.</p> <p>So, we have considered the simplest and <a href="https://bumotors.ru/en/pyat-sposobov-bystree-zaryadit-smartfon-vklyuchit-bystruyu-zaryadku-ili.html">quick ways</a>, allowing you to close the entire site or individual materials of the resource from indexing <a href="https://bumotors.ru/en/kak-funkcioniruyut-poiskovye-mashiny-i-kak-pravilno-stroit-zaprosy.html">search engines</a>... And, as it turned out, robots.txt is one of the easy and relatively reliable methods.</p> <p>Recently he shared with me the observation that many sites that come to us for audit often have the same errors. Moreover, these mistakes can not always be called trivial - even advanced webmasters make them. This is how the idea came about to write a series of articles with instructions for tracking and fixing <a href="https://bumotors.ru/en/ne-udaetsya-aktivirovat-windows-8-kod-oshibki-0x8007007b-standartnoi-sposob.html">similar mistakes</a>... The first in line is a guide to setting up site indexing. I give the floor to the author.</p> <p>For good site indexing and better page ranking, it is necessary that the search engine crawls the key promoted pages of the site, and on the pages themselves can accurately highlight the main content, without getting confused in the abundance of service and auxiliary information. <br>Websites that come to us for analysis have two types of errors:</p> <p>1. When promoting a site, their owners do not think about what the search bot sees and adds to the index. In this case, a situation may arise when the index contains more garbage pages than promoted pages, and the pages themselves are overloaded.</p> <p>2. On the contrary, the owners were too zealous to clean up the site. Together with <a href="https://bumotors.ru/en/gde-knopka-option-otobrazhenie-podrobnoi-informacii-o-seti-zakrytie-nenuzhnyh.html">unnecessary information</a> data important for the promotion and evaluation of pages can also be hidden.</p> <p>Today we want to consider what is really worth hiding from search robots and how best to do it. Let's start with the content of the pages.</p> <h2>Content</h2> <h3>Problems related to closing content on the site:</h3> <p>The page is evaluated by search robots comprehensively, and not only by textual indicators. Carried away by closing various blocks, information that is important for evaluating the usefulness and ranking is often deleted.</p> <p><b>Let's give an example of the most <a href="https://bumotors.ru/en/oshibka-internal-server-error-chto-eto-i-kak-s-nei-borotsya-obzor-samyh-chastyh.html">frequent mistakes</a>: </b><br>- the site header is hidden. It usually houses <a href="https://bumotors.ru/en/kak-skryt-kontaktnuyu-informaciyu-vk-kak-skryt-stranicu-vkontakte-ot.html">Contact Information</a>, links. If the site header is closed, search engines may not know that you have taken care of visitors and placed <a href="https://bumotors.ru/en/kak-podobrat-k-materinskoi-plate-blok-pitaniya-kak-vybrat-blok-pitaniya.html">important information</a> on a prominent place;</p> <p>- filters, search form, sorting are hidden from indexing. The presence of such opportunities in an online store is an important commercial indicator that is better shown, not hidden. <br>- information about payment and delivery is hidden. This is done to enhance the uniqueness of the product cards. But this is also information that should be on a high-quality product card. <br>- the menu is "cut" from the pages, impairing the assessment of the ease of navigation on the site.</p> <p><b>Why is part of the content closed on the site?</b><br>There are usually several goals: <br>- to focus on the main content on the page by removing auxiliary information, service blocks, menus from the index; <br>- to make the page more unique and useful by removing duplicate blocks on the site; <br>- remove "extra" text, increase the text relevance of the page.</p> <p><b>All of this can be achieved without hiding some of the content!</b><br><i>Do you have a very large menu?</i><br>Display on the pages only those items that are directly related to the section.</p> <p><i>Many choices in filters?</i><br>Print only popular ones in the main code. Load the rest of the options only if the user clicks the "show all" button. Yes, scripts are used here, but there is no deception - the script is triggered at the user's request. The search engine will be able to find all the items, but when evaluated, they will not receive the same value as the main content of the page.</p> <p><i>On the page <a href="https://bumotors.ru/en/samyi-bolshoi-blok-pitaniya-kak-pravilno-vybrat-bloki-pitaniya.html">big block</a> with the news?</i><br>Reduce their number, display only headlines, or simply remove the news block if users rarely follow links in it or there is little main content on the page.</p> <p>Search robots, although far from ideal, are constantly improving. Google already shows hiding scripts from indexing as an error in <a href="https://bumotors.ru/en/kak-otklyuchit-stroku-poiska-gugl-android-7-kak-ubrat-poisk-gugl-na-androide.html">google panels</a> <a href="https://bumotors.ru/en/proverka-optimizacii-dlya-mobilnyh-ustroistv-proverka.html">Search Console</a>("Blocked Resources" tab). Not showing part of the content to robots can really be useful, but this is not an optimization method, but rather temporary "crutches" that should be used only when absolutely necessary.</p> <p><b>We recommend:</b><br>- treat content hiding as a "crutch" and resort to it only in extreme situations, trying to modify the page itself; <br>- removing part of the content from the page, focusing not only on text indicators, but also assessing the convenience and information that affects; <br>- before hiding content, conduct an experiment on several test pages. Search bots know how to parse pages and your fears about a decrease in relevance may be in vain.</p> <p><b>Let's take a look at the methods used to hide content:</b></p> <h3>Noindex tag</h3> <p>This method has several disadvantages. First of all, this tag is only taken into account by Yandex, so it is useless for hiding text from Google. In addition, it is important to understand that the tag prohibits indexing and displaying only text in search results. The rest of the content, such as links, is not covered.</p> <p>Yandex support doesn't really cover how noindex works. A little <a href="https://bumotors.ru/en/cennaya-informaciya-kriterii-otbora-bolshaya-enciklopediya.html">more information</a> is in one of the discussions on the official blog.</p> <p><b>User question:</b></p> <blockquote><p>“The mechanics of action and the influence on the ranking of the tag are not fully understood. <noindex>text</noindex>... Next, I will explain why they are so puzzled. And now - there are 2 hypotheses, I would like to find the truth.</p> <p># 1 Noindex does not affect the ranking / relevance of the page at all</p> <p>Under this assumption: the only thing it does is to block some of the content from appearing in search results. In this case, the entire page is considered as a whole, including closed blocks, relevance and related parameters (uniqueness; compliance, etc.) for it is calculated according to all content in the code, even closed.</p> <p># 2 Noindex affects ranking and relevance, since content closed in the tag is not rated at all. Accordingly, the opposite is true. The page will be ranked according to the content open to robots. "</p> </blockquote> <p><b>When the tag might be useful:</b><br>- if there is a suspicion that the page is downgraded in Yandex search results due to over-optimization, but at the same time occupies TOP positions for important phrases in Google. You need to understand that this is a quick and temporary solution. If the entire site falls under "Baden-Baden", noindex, as Yandex representatives have repeatedly confirmed, will not help; <br>- to hide the general <a href="https://bumotors.ru/en/informacionnaya-bezopasnost-organov-vnutrennih-del-zashchita-informacii-v.html">service information</a> which you are required to list on the page due to corporate or legal regulations; <br>- to correct snippets in Yandex if they contain unwanted content.</p> <h3>Hiding content with AJAX</h3> <p>This <a href="https://bumotors.ru/en/nex-chem-otkryt-est-li-universalnyi-metod-otkrytiya-neizvestnyh-failov.html">universal method</a>... It allows you to hide content from both Yandex and Google. If you want to cleanse the page of content that dilutes relevance, it is better to use it. Representatives of the PS, of course, do not welcome this method and recommend that <a href="https://bumotors.ru/en/kak-rabotayut-poiskovye-roboty-poiskovye-roboty.html">search robots</a> saw the same content as users. <br>Technology <a href="https://bumotors.ru/en/ajax-chto-eto-takoe-vliyanie-na-seo-preimushchestva-i-nedostatki-tehnologii.html">using AJAX</a> is widespread and if you do not engage in explicit cloaking, sanctions for its use are not threatened. The disadvantage of this method is that you still have to block access to scripts, although Yandex and Google do not recommend doing this.</p> <h2>Site pages</h2> <p>For <a href="https://bumotors.ru/en/uspeshnoe-prodvizhenie-v-socialnyh-setyah-prodvizhenie-biznesa-v-socialnyh.html">successful promotion</a> it is important not only to get rid of <a href="https://bumotors.ru/en/prilozhenie-dlya-avtomaticheskoi-ochistki-operativnoi-pamyati-android.html">unnecessary information</a> on the pages, but also clear <a href="https://bumotors.ru/en/chto-znachit-relevantnyi-chto-takoe-relevantnost-poiska-indeks-poiskovoi.html">search index</a> site from useless garbage pages. <br>Firstly, it will speed up the indexing of the main promoted pages of the site. Second, the presence in the index <a href="https://bumotors.ru/en/samoe-bolshoe-chislo-prosmotrov-na-yutube-pyat-samyh-prosmatrivaemyh-video-na.html">a large number</a> junk pages will negatively affect the site's rating and promotion.</p> <p><b>Let's immediately list the pages that are advisable to hide:</b></p> <p>- pages for registration of applications, baskets of users; <br>- site search results; <br>- personal information of users; <br>- product comparison results pages and similar <a href="https://bumotors.ru/en/operacionnaya-sistema-sluzhit-dlya-funkcii-raboty-os-primery.html">auxiliary modules</a>;<br>- pages generated by search filters and sorting; <br>- pages of the administrative part of the site; <br>- print versions.</p> <p><b>Let's consider the ways in which you can close pages from indexing.</b></p> <h3>Close in robots.txt</h3> <p>This is not the best method.</p> <p>First, the robots file is not designed to combat duplicates and clean sites from junk pages. For these purposes, it is better to use other methods.</p> <p>Secondly, a robots file is not a guarantee that a page will not be indexed.</p> <p>Here's what Google writes about it in its help:</p> <h3>Noindex meta tag</h3> <p>To ensure that pages are excluded from the index, it is best to use this meta tag.</p> <p>Below is a variant of the meta tag that both search engines understand:</p><p> <meta name="robots" content="noindex, nofollow"> </p><p><b>An important point!</b></p> <p>For Googlebot to see the noindex meta tag, you need to open access to pages that are closed in the robots.txt file. If this is not done, the robot may simply not go to these pages.</p> <h3>X-Robots-Tag Headers</h3> <p>A significant advantage of this method is that the ban can be placed not only in the page code, but also through the root .htaccess file.</p> <p>This method is not very common in the Russian Internet. We believe that the main reason for this situation is that Yandex uses this method <a href="https://bumotors.ru/en/kak-i-gde-hranit-dannye-v-techenie-dolgogo-vremeni-rezervnoe.html">for a long time</a> did not support. <br>Yandex employees wrote this year that the method is now supported.</p> <p>The support response cannot be called detailed))). Before proceeding to prohibiting indexing using X-Robots-Tag, it is better to make sure that this method works for Yandex. We have not yet set up our experiments on this topic, but, perhaps, we will do it in the near future.</p> <h3>Password protection</h3> <p>If you need to hide the entire site, for example, the test version, we also recommend using this method. Perhaps the only drawback is that it may be difficult to scan a domain hidden under a password if necessary.</p> <h3>Eliminate junk pages with AJAX</h3> <p>The point is not just to prohibit indexing of pages generated by filters, sorting, etc., but not to create <a href="https://bumotors.ru/en/kak-voiti-na-zablokirovannuyu-stranicu-tor-i-emu-podobnye.html">similar pages</a> on the site.</p> <p>For example, if a user selected a set of parameters in the search filter for which you did not create <a href="https://bumotors.ru/en/kak-udalit-otdelnuyu-stranicu-v-vorde-kak-udalit-stranicu.html">separate page</a>, changes in the products displayed on the page occur without changing the URL itself.</p> <p>The difficulty with this method is that it usually cannot be applied to all cases at once. Some of the generated pages are used for promotion.</p> <p>For example, filter pages. For "refrigerator + Samsung + white" we need a page, but for "refrigerator + Samsung + white + two-compartment + no frost" - no longer.</p> <p>Therefore, you need to make a tool that involves the creation of exceptions. This complicates the task of programmers.</p> <h3>Use methods of prohibiting indexing from search algorithms</h3> <p><b>"URL Parameters" in <a href="https://bumotors.ru/en/skachat-prilozhenie-nastroika-sistemy-android-planshet-otklyuchi-google-search-i-drugoi.html">Google search</a> Console</b></p> <p>This tool allows you to specify how to identify the occurrence in <a href="https://bumotors.ru/en/stranichnye-bloki-pravilo-page-kak-izmenit-url-stranic-v-wordpress.html">Page urls</a> new parameters.</p> <p><b>Clean-param directive in robots.txt</b></p> <p>In Yandex, a similar ban for URL parameters can be set using the Clean-param directive. <br>You can read about it.</p> <p>Canonical addresses as prevention of garbage pages on the site <br>This meta tag was created specifically to combat duplicates and junk pages on the site. We recommend prescribing it throughout the site as a prevention of duplicate and garbage pages appearing in the index.</p> <h3>Tools for spot deletion of pages from the Yandex and Google index</h3> <p>If a situation has arisen when you urgently need to delete information from the index, without waiting for your ban to be seen <a href="https://bumotors.ru/en/sovremennye-problemy-nauki-i-obrazovaniya-sushchnostnaya-harakteristika-opytno-eksperimentalnoi-raboty.html">search work</a>, you can use tools from the Yandex.Webmaster panel and Google Search Console.</p> <p>In Yandex, this is "Remove URL":</p> <p>In Google Search Console "Remove URL":</p> <h2>Internal links</h2> <p>Internal links are closed from indexing to redistribute internal weights to the main promoted pages. But the point is: <br>- such a redistribution may have a bad effect on <a href="https://bumotors.ru/en/naznachenie-sistem-svyazi-obshchie-svedeniya-o-sistemah-svyazi.html">general ties</a> between pages; <br>- links from templated pass-through blocks usually have less weight or may not be counted at all.</p> <p>Consider the options that are used to hide links:</p> <h3>Noindex tag</h3> <p>This tag is useless for hiding links. It only applies to text.</p> <h3>Rel = "nofollow" attribute</h3> <p>Currently, the attribute does not allow you to save weight on the page. Using rel = ”nofollow” simply loses weight. By itself, using the tag for internal links does not seem very logical.</p> <h3>Hiding links with scripts</h3> <p>This is actually the only working method by which you can hide links from search engines. You can use Ajax and load link blocks after loading the page, or add links by replacing the tag with the script <span>on the <a>... It is important to take into account that <a href="https://bumotors.ru/en/kakie-pravila-effektivnogo-poiska-informacii-v-internete-itak-algoritm.html">search algorithms</a> are able to recognize scripts.</p> <p>As with content, this is a crutch that can sometimes solve a problem. If you are not sure that you will get a positive effect from the hidden link block, it is better not to use such methods.</p> <h2>Conclusion</h2> <p>Removing bulky end-to-end blocks from a page can really have a positive effect on ranking. It is better to do this by shortening the page and displaying only the content that visitors need. Hiding content from a search engine is a crutch that should be used only in cases where it is impossible to reduce cross-cutting blocks in other ways.</p> <p>When removing some of the content from the page, do not forget that not only text criteria are important for ranking, but also completeness of information and commercial factors.</p> <p>The situation is similar with internal links. Yes, sometimes it can be useful, but artificial redistribution of the link mass on the site is a controversial method. It is much safer and more reliable to simply discard links you are not sure about.</p> <p>With the pages of the site, everything is more unambiguous. It is important to make sure that junk pages of little use do not end up in the index. There are many methods for this that we have collected and described in this article.</p> <p>You can always take our advice on <a href="https://bumotors.ru/en/mozhno-li-posle-formatirovaniya-diska-vosstanovit-dannye-tehnicheskie-aspekty.html">technical aspects</a> optimization, or order a turnkey promotion, which includes.</p> <script>document.write("<img style='display:none;' src='//counter.yadro.ru/hit;artfast_after?t44.1;r"+ escape(document.referrer)+((typeof(screen)=="undefined")?"": ";s"+screen.width+"*"+screen.height+"*"+(screen.colorDepth? screen.colorDepth:screen.pixelDepth))+";u"+escape(document.URL)+";h"+escape(document.title.substring(0,150))+ ";"+Math.random()+ "border='0' width='1' height='1' loading=lazy loading=lazy>");</script> </div> <div class="post-social-counters-block"> <div style="margin-top: 12px"> <noindex></noindex> </div> </div> </div> </div> <a name="comments"></a> <h3 class="best-theme-posts-title">Top related articles</h3> <div class="container-fluid"> <div class="best-theme-posts row"> <div class="theme-post col-sm-4"> <a href="https://bumotors.ru/en/fallout-4-izmenenie-vneshnosti-konsol-otlichaetsya-dvizhenie-myshi-po.html"> <div class="img_container"><img src="/uploads/686669656a5df8de6a6f9d463fe60026.jpg" border="0" alt="Different mouse movement vertically and horizontally" width="320" height="180" / loading=lazy loading=lazy></div> <span class="theme-post-link">Different mouse movement vertically and horizontally</span> </a> </div> <div class="theme-post col-sm-4"> <a href="https://bumotors.ru/en/kak-szhat-tekstury-v-fallout-4-ruchnaya-nastroika-grafiki.html"> <div class="img_container"><img src="/uploads/d7c98012600dd0a7e0e0c14fa8d2e1eb.jpg" border="0" alt="How to compress textures in fallout 4" width="320" height="180" / loading=lazy loading=lazy></div> <span class="theme-post-link">How to compress textures in fallout 4</span> </a> </div> <div class="theme-post col-sm-4"> <a href="https://bumotors.ru/en/nizkie-nastroiki-grafiki-fallout-4-ostalos-tolko-ponyat-nuzhnyi-uroven.html"> <div class="img_container"><img src="/uploads/da3d853bd41405e0322c2a1e985df14b.jpg" border="0" alt="It remains only to understand the required level" width="320" height="180" / loading=lazy loading=lazy></div> <span class="theme-post-link">It remains only to understand the required level</span> </a> </div> </div> </div> </div> <a name="comments"></a> </div> <div class="right-column col-sm-4 col-md-4"> <div class="write"> <span class="tags-title">Categories:</span> <ul style="height: 286px;" id="right-tags" data-tagscount="18" data-currentmaxtag="10" class="tags"> <li class=""><a href="https://bumotors.ru/en/category/programs/">Programs</a></li> <li class=""><a href="https://bumotors.ru/en/category/safety/">Safety</a></li> <li class=""><a href="https://bumotors.ru/en/category/windows-10/">Windows 10</a></li> <li class=""><a href="https://bumotors.ru/en/category/iron/">Iron</a></li> <li class=""><a href="https://bumotors.ru/en/category/windows-8/">Windows 8</a></li> <li class=""><a href="https://bumotors.ru/en/category/vkontakte/">In contact with</a></li> <li class=""><a href="https://bumotors.ru/en/category/errors/">Errors</a></li> </ul> </div> <div class="banner"> </div> </div> </div> </div> <div style="clear:both"></div> </div> <div class="footer"> <div class="subscribe"> <div class="main-wrapper container"> <div class="row"> <div class="col-sm-8"> </div> <div class="col-sm-4"> <div class="social"> <a href="" class="vk social-ico"></a> <a href="https://facebook.com/" class="fb social-ico"></a> <a href="https://twitter.com/" class="tw social-ico"></a> </div> </div> </div> </div> </div> <div class="info"> <div class="main-wrapper container"> <div class="row"> <span class="footer-info col-xs-12">© 2021 bumotors.ru. How to set up smartphones and PCs. Informational portal.</span> </div> </div> </div> </div> </body> </html>