: I always wanted to understand this, but its significance was so small that there was always a reason not to do it :)
Have you wondered: URL - what is it?
I always come across this, but I still did not want to understand what is the difference between the terms URI, URL, URN, and then suddenly a post (unfortunately, it has already sunk into oblivion), I decided - I will read it myself, and I will tell others, although, as mentioned above, this will not change anything, but sometimes I like to play the words, so read the explanatory translator:
Have you ever noticed the address bar in your browser? What's this? URI, URL or URN? Many of us do not differentiate between URI, URL, URN, and some of us have not even heard the terms URI and URN, everyone just uses the term URL. Let's try to figure it out together.
Explanation of abbreviations
URI - Uniform Resource Identifier (unified identifier resource)
URL - Uniform Resource Locator (unified locator resource)
URN - Unifrorm Resource Name (unified name resource)
Attention, here the truth is hidden in the little things, but so far nothing is clear, some kind of mess. Let's go further.
Definition
URI: Indicates the name and address of a resource on the network. It is usually divided into URL and URN, so URL and URN are part of the URI.
URL: The address of some resource on the web. The URL identifies the location of the resource and how it is accessed.
URN: The name of some resource on the web. The meaning of the URN is that it only identifies the name of a specific item, which can be found in many specific locations.
There is nothing better than a specific example.
URI = http: //site/2009/09/uri-url-urn.html
URL = http: // site
URN = /2009/09/uri-url-urn.html
Let's summarize
URI is the concept of an abstract identifier, while URL and URN are concrete implementations of address and name.
I hope everyone understands everything. Be literate!
The perception of each of us is individual, therefore - argue and read the discussions in the comments to the article, there are many interesting things.
Disputes on this issue - how to write the URL correctly, with or without a trailing slash? - were and will be. The argumentation is varied and often contradictory. And there are two kinds of payback for an incorrect URL entry. On the part of search engines, these are supposedly penalties for duplicate pages. From a performance point of view, this is supposedly an extra redirect to the correct record page, automatically generated by the server.
However, analyzing the technical specifications of Internet standards, in particular the document "RFC 1738 - Uniform Resource Locators (URL)", we have to admit that both versions of the web resource address record are formally correct, and the sanction for the use of one or another version is nothing more than a quirk search engine or pseudo-SEO stories.
From the point of view of brevity, the option without a slash at the end seems to be more correct, regardless of whether your link is a "file" on the server or a "folder", indirect proof of which will be demonstrated below. But there is not a single statement in the document that another option is incorrect or refers to a completely different resource.
I will not download you a multi-page translation of the RFC mentioned, since, firstly, the purpose of the question was slashes at the end of the URL, and secondly, the publication is addressed to ordinary users of the engines, including those who are not interested in all the details, they are waiting for short explanations and evidence on the merits. Accordingly, I will cite excerpts from this document as evidence and explain. Anyone who is not interested in this can immediately look at the output at the end of the article.
General URL syntax
The first thing to draw your attention to is the excerpt from paragraph 2. General URL Syntax. In each case, I will cite a fragment of the text in the original language, followed by a translation into Russian.
URLs are used to `locate" resources, by providing an abstract identification of the resource location. URLs are used to "locate" resources, by providing an abstract identification of the resource location.
That is, the URL itself is a pure abstraction. That it may seem to us outwardly similar to the name of a file or folder does not at all mean a physical reference to just such and such a file, and not some other in the file space of the server. This will be stated directly in the document below.
The note In general, with regard to http links, it is in principle incorrect to say that for example
- http://domain.com/path/subpath/filename.txt- allegedly points to a file
- http://domain.com/path/subpath/- allegedly points to a folder
- http://domain.com/path - allegedly points to a folder incorrectly
We just used to say that, because it is convenient to associate links with files on the site. In reality, all these links point to some resources, without in any way denoting the type of resource. What is hidden behind each resource, that is, what kind of real file or folder and what type of content will be served by such a link, is already determined by the server configuration.
It is important to understand that in links there is no such thing as "file", "folder", "subfolder", "text", "picture", "html", "script", "stylesheet" and so on. No slash at the end or its absence means absolutely nothing until the link is transformed inside the server, and he himself decides where the link actually points and what type of content is hidden behind it. This is the only solution related to the internal architecture of the server.
Hierarchical schemes
The following is an excerpt from paragraph 2.3 Hierarchical schemes and relative links (hierarchical schemes and relative links).
Some URL schemes (such as the ftp, http, and file schemes) contain names that can be considered hierarchical; the components of the hierarchy are separated by "/". Some URL schemes (such as ftp, http, and file) contain names that can be considered hierarchical; the members of the hierarchy are separated by the "/" symbol.
That is, it is argued that in certain address schemes, the content of a resource locator is not prohibited from being considered hierarchical, and it has not yet been stipulated that the hierarchy is equivalent to any form, say, file.
General network schema syntax
The following is an excerpt from paragraph 3.1. Common Internet Scheme Syntax
//
The note This, by the way, is the answer to a question derived from the one we are considering. Often they argue on this issue: how to give a link to a domain (host) correctly - without a slash at the end or with a slash?
How to do it right http://domain.com/ or http://domain.com?
And so and so right. Simply the first slash after the hostname is intended to separate the pathname from the hostname. The same paragraph of the document states it like this:
Url-path The rest of the locator consists of data specific to the scheme, and is known as the "url-path". It supplies the details of how the specified resource can be accessed. Note that the "/" between the host (or port) and the url-path is NOT part of the url-path. The rest of the locator consists of schema-specific data and is known as the "url-path". It provides details of how the specified resource can be accessed. Note that the "/" character between the host (or port) and the URL path is not part of the url-path.
We were not obliged by a word to put this trailing character or not to put it when the url-path is an empty string (as many of us would say when the URL refers to the root of the site). Nobody has the right to impose penalties on you "for two takes of the main page", because according to the specification, in both cases you link the URL to the same resource.
Let's continue another excerpt from the same paragraph.
The url-path syntax depends on the scheme being used, as does the manner in which it is interpreted. The syntax for url-path depends on the scheme used, as well as the way it is interpreted.
This is another confirmation that each locator scheme has its own concept of "hierarchy" and the way of its interpretation.
Hierarchy
For some file systems, the "/" used to denote the hierarchical structure of the URL corresponds to the delimiter used to construct a file name hierarchy, and thus, the filename will look similar to the URL path. This does NOT mean that the URL is a Unix filename. The "/" character is used to indicate the hierarchical structure of the URL according to the delimiter used in constructing the hierarchy of file names, and thus, on some filesystems, the filename looks like a URL path. But that doesn't mean the URL is a Unix-like filename.Although this paragraph refers to the ftp scheme, it nevertheless applies to other schemes (http, gopher, prospero, and so on). Only in the file scheme, the slash symbol logically denotes the same as in file names, for example file: //server_or_device/path/subpath/filename.txt.
Http
An HTTP URL takes the form: http: //
The note It also states that you can specify a link without a trailing slash. In this case, we were talking about a situation when the link path is empty - it points to the root of the host.
Formal notation
Finally, an excerpt from paragraph 5. BNF for specific URL schemes.
Here, the optional parts are shown in square brackets. An asterisk in front of a parenthesis indicates 0 or more repetitions of the same fragment as indicated in parentheses. The vertical bar should be understood as OR.
Hostport = host [":" port] ... ... httpurl = "http: //" hostport ["/" hpath ["?" search]] hpath= hsegment * ["/" hsegment] hsegment = * [uchar | ";" | ":" | "@" | "&" | "="] search = * [uchar | ";" | ":" | "@" | "&" | "="] ... ... lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" alpha = lowalpha | hialpha digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" safe = "$" | "-" | "_" | "." | "+" extra = "!" | "*" | "" "|" ("|") "|", "hex = digit |" A "|" B "|" C "|" D "|" E "|" F "|" a "|" b " | "c" | "d" | "e" | "f" escape = "%" hex hex unreserved = alpha | digit | safe | extra uchar = unreserved | escape
Notice how the hpath element - the link path - is formed exactly according to the rules. Path hsegment elements - segments - are separated by a slash. As if hinting at the important idea that the slash divides the path into hierarchical parts and is always inside. In principle, it is possible that the last hsegment element can be an empty string (this follows from its definition), and then a closing slash involuntarily appears at the end of the URL.
Conclusion
Dividing a path into segments using a slash character implies the presence of non-empty names of these segments. Accordingly, a link with a slash at the end seems illogical (although not forbidden) in the sense that it seems to point to a certain last segment of the path, but, moreover, does not name this segment in any way. Exactly as illogical (but also not prohibited) link http://domain.com/level1////levelX, which does not name intermediate segments of the path, if the path is considered not as a set of parameters, but as a hierarchical structure.
In colloquial language, the semantic content of the two links can be explained as follows:
- - addresses to the default starting point of the second level of the hierarchy
- - addresses an undefined point within the second level of the hierarchy, that is, as if the server is assigned the task that "we are addressing the second level of the hierarchy, and you yourself determine which point you consider in this level as the default initial".
From all that has been said above it follows, which is similar to how links
- http://domain.com
- http://domain.com/
address the visitor to the root of the site, and for example links
- http://domain.com/level1/level2
- http://domain.com/level1/level2/
address the visitor to the second level of the resource hierarchy. And the fact that a certain server can interpret the slash at the end in its own way and start internally redirecting to the default starting point of the level - say, to the index.html file, this is already a special case of a specific configuration. Just as in the implementation of a human-readable URL system, all redirect records using the mod_rewrite server module define their own (inherent in a specific engine) concept of a hierarchical URL structure, in which path elements can be equated to request parameters and have nothing to do with the file structure of the site ( classic example: http://domain.com/ru/path, the ru element is the parameter of the current language, not a folder on the site).
I would like to emphasize that this is the internal knowledge of the server, due to its configuration, as well as the engine installed on the site. An external service, say the same search engine, cannot make conjectures and has no idea whether and how the links with and without a slash differ, unless the site server has been specially configured so as to display different content on such links.
For your information
At the implementation level, the issue of slashes at the ends is not of fundamental importance, to which there are many confirmations among eminent portals. On some, all links end with a slash, on others - without a slash. The main thing is that the content on the links does not turn out to be different, and for Yandex you also need to register the 301st redirect from those links that you do not use (say, those ending with a slash), to those that you use. The fact is that, according to the unconfirmed statements of the Yandex support service, this search engine can allegedly make mistakes and not "glue" (remember in their knowledge) or, with some delay, glue slash-without-slash addresses into one.
Here is an example of implementing such a redirect using the root .htaccess file:
# if the input url ends with a slash (em, s), # set the 301st redirect to the page without a slash RewriteCond% (REQUEST_URI) ^ /. + / $ RewriteRule ^ (. *?) / + $ http: //% (HTTP_HOST ) / $ 1
Google (again, according to information not confirmed by experiment), these redirects are not important, since it seems to be able to glue such addresses correctly and without redirects.
Remember There are quite a few people who consider themselves SEO specialists. But not all of them are. Moreover, the topic of SEO is often speculated without proper knowledge and grounds, just in the expectation that you are ignorant in this area, so it is easy to believe in any "noodles". When you are told that some of your pages "flew out of the index", use a very good recommendation from Yandex: You can find out about indexing errors, if any, in the Yandex.Webmaster service. In this service, you can always see a list of your pages in search and a list of pages excluded from the search for some reason. Google has a similar service. Trust this knowledge, and not the opinion of pseudo-specialists, who somewhere heard something out of the way, and on that basis recommend that you do what they think is the only right thing.
Here A very interesting post, Little Known SEO Facts, released in April 2017. It presents a large study with many screenshots, which began with the goal of testing the validity of several popular judgments in the field of search engine promotion and using clear examples to convey the results to the average site owner. The same study simultaneously demonstrates to the young reader a number of obvious, mundane, and rather even inconspicuous, but still surprising features of organic search results in Google and Yandex searches.
Here While the following link has almost nothing to do with SEO, it will still be attractive to SEO masters looking for additional orders right now. A commercial offer is posted under the link, the guys have found an interesting way to use the site. A private business is offered to create an online billboard based on some special theme, under the control of which the site, or rather its first screen, looks like a banner stretch on billboards of outdoor advertising. On the smartphone, I turned the screen, the stretching became vertical and occupies the entire screen area, turned back, became horizontal and again to the entire screen. And under the first screen there is a text appendage, where users usually do not scroll, but the search engine sees this text well. So, the most nimble buratins of regional business buy these inexpensive online billboards as a profitable alternative to contextual advertising and the Yandex and Google Display Network. And in order to hang out to the maximum in the local search index, they are ready to whip money immediately to a bunch of seo-texts to promote their shield, which smells of a non-acidic amount. Judging by the rumors, orders for 30 kilo rubles are slipping, and since the guys outsource them to SEO partners, here you can build partnership bridges and get good earnings.
You can get lost not only in the forest, but also online. And this may be due to the wrong path or address leading to the resource. Don't know what a URL is? Then, before embarking on a further journey through the virtual space, let's deal with the email address system.
What is URL
URL is a generally accepted standard for recording an address and indicating the location of a resource on the Internet. From English its name ( Uniform Resource Locator) is translated as a uniform resource locator. You can find an earlier decoding of the abbreviation URL - Universal Resource Locator (universal resource locator). But both meanings complement the concept of URL rather than contradict each other.
The basic format of the URL structure is as follows:
://:@:/?#
- most often the protocol is meant.
login - username used for authorization on the resource.
password - user password for authorization.
host - the domain name of the host.
port - host port used during connection.
URL - the path where the requested resource is located on the server.
parameters and anchor- the value of the variables and the identifier on a specific resource.
Passing the value of variables in a query string is possible only using the GET method.
Let's consider the format of the URL of the page of the requested resource with practical examples. On the client side, the URL is displayed in the address bar of the browser:
The most common options are:
- http: // ru.wikipedia.org/wiki/Home_page- http ( hypertext transfer protocol);
- https://ru.wikipedia.org/wiki/Home_page- https is used as a transmission method. Is a secure form of the http protocol that uses encryption (SSL or TLS);
- fttp: //wikipedia.org/wiki/file.txt- file transfer protocol fttp;
- http://mail.ru/script.php?num=10&type=new&v=text- passing the values of variables in the query string using the GET method.
Any URL format is primarily a character string. It may include:
2; Letters.
2; Arabic numerals (0-9).
2; Reserved characters ("+", "=", "!" And others).
2; Special characters - let's dwell on them in more detail.
Using special characters in URLs
Of course, such too "special" characters are not used in the URL. But there are several:
- ? - serves to separate the block with the transmitted parameters in the query line;
- & - separates the passed parameters from each other;
- = - separates the variable in the parameter from its value;
- : - serves to separate the protocol from the rest of the URL;
- # - the character is used in the local part of the address. Allows you to refer to a specific part of the requested page;
- @ - is indicated in the user's registration data and when transferring data using the mailto protocol.
But this is all just theory. Therefore, before learning the rest, let's look at a small practical example.
Illustrative example
For clarity, let's take such a simple registration form:
Here is its code: