How to set up smartphones and PCs. Informational portal
  • home
  • Reviews
  • A uniform resource identifier (uri), its purpose, and its parts. WWW server operation scheme

A uniform resource identifier (uri), its purpose, and its parts. WWW server operation scheme

A URI (Uniform Resource Identifier) ​​is a compact string of characters used to identify an abstract or physical resource. A resource is understood as any object that belongs to a certain space. The need for a URI has been understood by WWW developers since the inception of the system, since it was supposed to unite in a single information environment the means using different ways identification of information resources. A specification was developed that included calls to FTP, Gopher, WAIS, Usenet, E – mail, Prospero, Telnet, X.500 and of course HTTP (WWW). As a result, a universal specification was developed that allows expanding the list of addressable resources due to the emergence of new schemes.

Where URIs are used are hypertext links that are written in tags and ... Embedded graphics are also addressed by URI specification in tags and ... The implementation of a URI for the WWW is called a URL (Uniform Resource Locator). More precisely, a URL is an implementation of a URI scheme mapped to an algorithm for accessing resources over network protocols. There is also a URN (Uniform Resource Name), which maps a URI to a namespace on the network.

The emergence of URNs stems from the desire to address MIME portions of a mail message. Principles of constructing a WWW address. The URI was based on the following principles:

· Extensibility - New addressing schemes should easily fit into existing URI syntax.

· Completeness - whenever possible, any of the existing schemes should be described using a URI.

· Readability - the address had to be easily readable by the user, which is generally typical for WWW technology - documents, along with links, can be developed in a regular text editor.

Before considering various schemes representation of addresses, here's an example of a simple URI:

http://polyn.net.kiae.su/polyn/index.html

The colon is preceded by the address scheme identifier - "http". This name is separated by a colon from the remainder of the URI, which is called the path. V in this case the path consists of the domain address of the machine on which the HTTP server is installed and the path from the root of the server tree to the "index.html" file. In addition to the complete URI notation shown above, there is a simplified one. It assumes that by the time it is used, many parameters of the resource address have already been defined (protocol, machine address in the network, some path elements). Under such assumptions, the author of hypertext pages can indicate only the relative address of the resource, i.e. an address relative to certain underlying resources.

A URL (Uniform Resource Locator) is a subset of URI schemes that identifies a resource by how it is accessed (for example, its "location on the web") rather than identifying it by name or other attributes of that resource. The URL explicitly describes how to get to the object.

Syntax: :, where:

scheme = "http" | "Ftp" | "Gopher" | "Mailto" | "News" | "Telnet" | "File" | "Man" | "Info" | "Whatis" | "Ldap" | "Wais" | ...- schema name

scheme – specific – part- depends on the scheme. In scheme – specific – part you can use hexadecimal values in the form:% 5f. The non-printable octets must be encoded: 00-1F, 7F, 80-FF.

Examples of URLs:

Http://www.ipm.kstu.ru/index.php

Ftp://www.ipm.kstu.ru/

URN (Uniform Resource Name) is a private "urn:" URI with a subset of the "namespace" that must be unique and immutable even when the resource no longer exists or is inaccessible.

It is assumed that, for example, the browser knows where to look for this resource.

Syntax: urn: namespace: data1.data2, more – data where namespace defines how the data after the second ":" is used.

URN example:

urn: ISBN: 0–395–36341–6

ISBN - thematic classifier for publishers,

0–395–36341–6 – specific number subject of a book or magazine

Upon receipt of the URN client program accesses the ISBN (the publisher's topical classifier directory on the Internet). And he gets a decryption of the subject number "0-395-36341-6" (for example: "quantum chemistry"). URN is relatively new, HTML is not included in current versions and directory services are not yet developed, so URN is not as widespread as URL.

Internet resource addressing schemes

There are 3 schemes for addressing Internet resources. The scheme specifies its identifier, machine address, TCP port, path in the server directory, variables and their values, label.

HTTP scheme... This is the basic layout for the WWW. The scheme contains its identifier, machine address, TCP-port, path in the server directory, search criterion and label.

Syntax: http: // [ [:@][:][?]]

http- circuit name

user- Username

password- user password

host- hostname

port- port number

url – path- the path to the file and the file itself

query (<имя–поля>=<значение>{&<имя–поля>=<значение>) - query string

By default, port = 80.

Here are some examples of URIs for the HTTP scheme:

http://polyn.net.kiae.su/polyn/manifest.html

This is the most common type of URI used in WWW documents. The schema name (http) is followed by a path consisting of the domain address of the machine and the full address of the HTML document in the tree HTTP server.

The IP address can also be used as the machine address:

http://144.206.160.40/risk/risk.html

If the HTTP server is running on a TCP port other than 80, this is reflected in the address:

http://144.206.130.137:8080/altai/index.html

http://polyn.net.kiae.su/altai/volume4 .html # first

FTP schema. This scheme allows you to address FTP file archives from World Wide Web client programs. In this case, the program must support the FTP protocol. In this scheme, it is possible to specify not only the name of the scheme, the address of the FTP-archive, but also the user ID and even his password.

Syntax: ftp: // [ [:@][:]

ftp- circuit name

user- Username

password- user password

host- hostname

port- port number

url – path- the path to the file and the file itself

By default, port = 21, user = anonymous, password = email address.

This scheme is most often used to access public FTP archives:

ftp://polyn.net.kiae.su/pub/0index.txt

In this case, a link to the archive "polyn.net.kiae.su" with the identifier "anonymous" or "ftp" (anonymous access) is recorded. If there is a need to specify the user ID and his password, then you can do this in front of the machine address:

ftp: // nobody: [email protected]/ users / local / pub

In this case, these parameters are separated from the machine address by the @ symbol, and from each other by a colon.

TELNET scheme... This scheme is used to access the resource in the remote terminal mode. Typically, the client invokes an add-on program to telnet. When using this scheme, you must specify a user ID, a password is allowed.

Syntax: telnet: // [ [:@][:]/

telnet- circuit name

user- Username

password- user password

host- hostname

port- port number

By default, port = 23.

Example: telnet: // name: [email protected]

In reality, access is carried out to public resources, and the identifier and password are generally known, for example, they can be found in the Hytelnet databases.

telnet: // guest: [email protected]

From the examples above, you can see that the URI resource address specification is fairly general and allows you to identify almost any Internet resource... In this case, the number of resources can be expanded by creating new schemes.

WWW service

Service WWW (World Wide Web) - designed for the exchange of hypertext information, built according to the "client-server" scheme. Browser ( Internet Explorer, Opera ...) is a multi-protocol client and HTML interpreter. And as a typical interpreter, the client performs different functions depending on the commands (tags). The range of these functions includes not only placing text on the screen, but exchanging information with the server as the received HTML text is parsed, which most clearly occurs when displaying graphic images embedded in the text.

The HTTP server (Apache, IIS ...) handles the client's requests to get the file. In the beginning, the WWW service was based on three standards:

· HTML (HyperText Markup Lan – guage) - language of hypertext markup of documents;

· URL (Universal Resource Locator) - a universal way of addressing resources on the network;

· HTTP (HyperText Transfer Protocol) - a protocol for the exchange of hypertext information.

WWW server operation scheme

A WWW server is a part of a global or intranet that enables network users to access hypertext documents located on this server. To interact with the WWW server, a network user must use specialized software - a browser (from the English browser) - a viewer.

Let's take a closer look at the WWW-server operation scheme:

1. The network user launches a browser, the functions of which include:

· Establishing connection with the server;

· Obtaining the required document;

· Display of the received document;

· Response to user actions - access to a new document. After starting the browser, at the user's command, or automatically establishes a connection with the specified WWW - server and sends it a request to receive the specified document.

2. The WWW server searches for the requested document and returns the results to the browser.

3. The browser, having received the document, displays it to the user and waits for his reaction. Possible options:

· Entering the address of a new document;

· Printing, search, other operations on the current document;

· Activation (pressing) of special areas of the received document, called links and associated with the address of the new document. In the first and third cases, there is an appeal for a new document.

And referrer Google play.

The Android platform is characterized by an extremely high level of fragmentation, as Google forces device developers to independently carry out OS porting, provision backward compatibility and support multiple devices. As a consequence, long if-else statements are often used to ensure that the most optimal method is used in the right context.

The situation is exactly the same with direct links in Android. Over time, a myriad of technical requirements have emerged that need to be met depending on the circumstances and user context. Branch's solution brings all of these implementations together, it is a linking framework that works in all edge cases. Branch links let you work around the complexity and use a standard solution, so you don't have to worry about compatibility. We strongly recommend using our solutions rather than trying to recreate similar functionality from scratch, as we provide them for free.

This series of publications describes all of the various direct link mechanisms we use and explains their implementation.

You can start working on the site start.branch.io or click on the button below.

Android URI scheme and intent filter

In Android 1.0, a direct linking mechanism was created based on the URI scheme. With it, a developer can register their app with a URI (Uniform Resource Identifier) ​​in operating system for a specific device after installing the application. Any URI can be used. text string no special characters like HTTP, pinterest, fb, or myapp. After registration, if you add ": //" to the end of the URI (for example, pinterest: //) and click this link, it will open Pinterest app... If the Pinterest app is not installed, a "Page not found" error will appear.

Requirements for using URI schemes in Android

  • Register an action to respond to a URI with an intent filter in the manifest.
  • The app must be installed to use. If the application is not installed, an error message will appear.

Setting up a URI scheme in Android

Configuring your application for a URI scheme is easy. First, you need to select an action in your application that your application should take when you enable a URI scheme and register an intent filter for it. Add the following code to the tag in the manifest corresponding to the action to open.

You can change your_uri_scheme to your desired URI scheme. The schema should ideally be unique. If it matches the URI scheme of another application, then when the user clicks on the link, the user will see the Android picker. You will often see this window if you have multiple web browsers installed on your device as they are all registered for HTTP URIs.

Handling direct links in an Android app

You will then need to parse the string to read the values ​​appended by the URI scheme.

Using URI Schemes in Android in Practice

There are significant limitations in how URIs handle direct links. We do not recommend using it without significant changes, because if there is no application on the device, an error message will simply be displayed. For effective use URI schemes will need to be added additional tools to handle edge cases, such as when the application is not installed.

Therefore, to provide sufficient user experience when the application is not installed, you need to enclose the URI scheme in client-side JavaScript that can be executed in a browser. This JS code will be hosted on your server, and you will send the link to users. Below is an example.

The code will try to open the app by specifying the URI scheme as the source for the iFrame, and then safely return to the Google Play store if the app fails to load.

Conclusion

Follow further publications dedicated to direct links in Android.

Direct links in Android are very complex, edge cases come across at every step. You may think that everything works great, until suddenly some user complains that he does not open links from Facebook in Android 4.4.4. That is why it is worth using programs like Branch: you can just forget all these difficulties like a bad dream and get used to the fact that links just always work.

Related Posts

Direct links, universal links, URI / URL schemes, and app links are available last years all of these mechanisms have significantly changed the way content is linked in mobile applications. Many application developers do not have a clear ...

Every day at Branch we work to bring the linking process to mobile platforms to perfection. Our links provide access to things like smart redirects, showing to the user ...

To access any network resources, you need to know where they are located and how to access them. The World Wide Web uses a standardized addressing and identification scheme, taking into account the experience of addressing and identifying e-mail, Gopher, WAIS, telnet, ftp, etc. - URL, Uniform Resource Locator.

URI(Uniform Resource Identifier) ​​(RFC 2396, August 1998) is a compact character string used to identify an abstract or physical resource. A resource is understood as any object that belongs to a certain space. Includes and overrides previously defined URLs (RFC 1738 / RFC 1808) and URNs (RFC 2141, RFC 2611).

The URI is designed to uniquely identify any resource.

Some subsets of URIs:

URN(Uniform Resource Name) - A private "urn:" URI with a subset of the "namespace" that must be unique and immutable even when the resource no longer exists or is not available.

It is assumed that, for example, the browser knows where to look for this resource.

Syntax:

urn: namespace: data1.data2, more-data, where namespace defines how the data after the second ":" is used.

URN example:

urn: ISBN: 0-395-36341-6

ISBN - thematic classifier for publishers

0-395-36341-6 - a specific number of the subject of a book or magazine



Upon receipt of the URN, the client program turns to the ISBN (the directory "Topical Classifier for Publishers" on the Internet). And he gets a decryption of the subject number "0-395-36341-6" (for example: "quantum chemistry").

URN is widely used in P2P networks (like edonkey).

Example URN pointing to an Adobe Photoshop v8.0 disk image on the edonkey network:

urn: ed2k: // | file | AdobePhotoshopv8.0.iso | 940769280 | | /

ed2k - indicates the network

Adobe Photoshop v8.0.iso - file name

940769280 - size in bytes

- file identifier (calculated using a hash function)

URL Uniform Resource Locator:

Url(Uniform Resource Locator, RFC 1738) is a unified resource locator (locator), a standardized way of recording the address of a resource on the WWW and the Internet. The URL has a flexible and extensible structure to indicate the location of resources on the network as naturally as possible, which identifies a resource by how it is accessed (eg, its "network location") instead of identifying it by name or other attributes of that resource.

Examples of URLs:

http://www.ipm.kstu.ru/index.php

ftp://www.ipm.kstu.ru/

A limited set of ASCII characters is used to represent the address.

General form addresses can be represented like this:

<схема>://<логин>:<пароль>@<хост>:<порт>/<полный-путь-к-ресурсу >

resource access scheme: http, ftp, gopher, mailto, news, telnet, file, man, info, whatis, ldap, wais, etc.

Login: Password- username and password used to access the resource

host- the domain name of the host or its IP address.

Port- host port for connection

full-path-to-resource - clarifying information about the location of the resource (depends on the protocol).

Examples of URLs:

http://example.com # request for the default start page

http://www.example.com/site/map.html # request a given page v specified directory

http://example.com:81/script.php # connect to non-standard port

http://example.org/script.php?key=value # request with passing parameters to the script

ftp: // user: [email protected]# connect to ftp server with authorization

http://192.168.0.1/example/www # connect by network address

file: ///srv/www/htdocs/index.html # open local file

gopher: //example.com/1 # connect to gopher server

URL - Uniform Resource Locators explicitly describe how to get to an object.

The advent of URLs is a significant innovation on the Internet. However, from the moment of its invention to the present day, the URL standard has a serious drawback - it can use only a limited set of characters, even less than in ASCII: Latin letters, numbers and only a few punctuation marks.

If we want to use Cyrillic characters, or hieroglyphs, or, say, specific characters of the French language in the URL, then the characters we need must be recoded in a special way.

On the Russian-language Wikipedia, you see examples every day url encoding as the Russian language uses Cyrillic characters. For example, a line like this:

http://ru.wikipedia.org/wiki/Microcredit

URL encoded as:

http://ru.wikipedia.org/wiki/%D0%9C%D0%B8%D0%BA%D1%80%D0%BE%D0%BA%D1%80%D0%B5%D0%B4%D0 % B8% D1% 82

This conversion takes place in two stages: first, each Cyrillic character is encoded in Unicode (UTF-8) into a sequence of two bytes, and then each byte of this sequence is written in hexadecimal notation:

M → D0 and 9C →% D0% 9C

and → D0 and B8 →% D0% B8

k → D0 and BA →% D0% BA

p → D1 and 80 →% D1% 80, etc.

Each such hexadecimal byte code is preceded by a percent sign (%) according to the URL specification - hence the English term "percent-encoding", which denotes how characters are encoded in URLs and URIs.

Since the letters of all alphabets undergo such a transformation, except basic Latin, then the URL with words in the vast majority of languages ​​(except English, Italian, Latin) may become unreadable for a person.

This is all in conflict with the principle of internationalism, proclaimed by all the leading organizations on the Internet, including the W3C and ISOC. This problem is intended to be solved by the IRI (International Resource Identifier) ​​standard - international resource identifiers in which it would be possible to use Unicode characters without problems, and which therefore would not infringe upon the rights of other languages.

Other url schemes

HTTP scheme.

The scheme specifies its identifier, machine address, TCP port, path in the server directory, variables and their values, label.

Syntax:

http: // [ [:@][:][?]]

http - schema name

user - username

host - hostname

port - port number

query (<имя-поля>=<значение>{&<имя-поля>=<значение>) - query string

Defined in RFC 2068. By default, port = 80.

Examples:
http://ipm.kstu.ru/internet/index.php

This is the most common type of URI used in WWW documents. The schema name (http) is followed by a path consisting of the domain address of the machine and the full address of the HTML document in the HTTP server tree.

The IP address can also be used as the machine address:

http://195.208.44.20/internet/index.php

If the HTTP server is running on a TCP port other than 80, this is reflected in the address:

http://195.208.44.20:8080/internet/index.php

http://195.208.44.20/internet/index.php#metka1
The "#" character separates the document name from the tag name.

Variables and their values ​​are passed as follows:
http://ipm.kstu.ru/internet/index.php?var1=value1&vard2=value2

The values ​​"var1" and "var2" are variable names, and "value1" and "value2" are their values.

FTP schema

This scheme allows you to address FTP file archives.

Syntax:

ftp: // [ [:@][:]

ftp - schema name

user - username

password - user password

host - hostname

port - port number

url-path - the path to the file and the file itself

Defined in RFC 1738. By default, port = 21, user = anonymous, password = email address, if the name is specified but the password is not, then it is requested in the dialog.

looks like:

//...//[; type = ], where :

Examples: ftp://ipm.kstu.ru/students/name/

To specify a username and password, you need to write it like this:
ftp: // name: [email protected]: //ipm.kstu.ru/students/name/

In this case, these parameters are separated from the machine address by the "@" symbol, and from each other by a colon.

MAILTO schema

This scheme is intended for sending mail.

Syntax:

mailto: [ {,,...}][?]

mailto - schema name

e-mail-1 ( @) - the first email address

user - username

host - hostname

e-mail-2 - second email address

query (<имя-поля-заголовка>=<значение>{&<имя-поля-заголовка>=<значение>) - query string

mailto: [email protected]

In this scheme, fields and their values ​​are passed:

mailto: [email protected]? subject = Subject_Email & body = Text_which_will_be_inserted_in_the_mail

The recipient's address can also be written as the value of the to field:

mailto: [email protected]? subject = Subject_Email & body = Text_which_will_be_inserted_in_the_mail

What is HTTP?

The first document (but not the standard) is RFC1945 (Hypertext Transfer Protocol - HTTP / 1.0 T. Berners-Lee, R. Fielding, H. Frystyk May 1996)

Latest version- RFC2616 (Hypertext Transfer Protocol - HTTP / 1.1 R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee June 1999)

Hypertext Transfer Protocol - hypertext transfer protocol, protocol high level(namely the application layer). Used by the WWW service to transfer Web pages.

HTTP (HyperText Transfer Protocol, RFC 2616, current version is HTTP / 1.1) is a hypertext transfer protocol. This protocol was originally intended for the exchange of hypertext documents, now its capabilities have been significantly expanded (in particular, support for streaming has been added).

HTTP is a typical client-server protocol; messages are exchanged according to the "request-response" scheme in the form of ASCII commands. A feature of the HTTP protocol is the ability to specify in a request and a response the way of representing the same resource by various parameters: format, encoding, language, etc. It is thanks to the possibility of specifying the method of encoding a message that the client and server can exchange binary data, although this protocol is text.

HTTP is an application layer protocol, but it is also used as a "transport" for other application protocols such as SOAP, XML-RPC, WebDAV.

The HTTP protocol defines a request-response way of interaction between a client program and a server program within World technology Wide Web.

To load a web page into a client browser, it sends a request to a special program installed on the server computer, called an http server, and processes the data received from it. In this case, the functions of the browser are to request a specific page from the server, get it, and display it on the user's screen. The server, on the other hand, accepts the request, looks for the requested document, and gives the client either the contents of the found file, or an error message if such a file was not found or access to it was denied for some reason. An important point to understand this process is that the http server does not parse the content of the transmitted document. Roughly speaking, the http-server does not care what is inside the requested file, it only transfers it to the browser, and all the work on structuring and displaying the information received is already taken over

The search for the requested page is carried out in a specific directory, which is allocated on the server computer for this site - a link to this directory is present in the address entered by the user. In the case when the call is made not to a specific document, but to the site as a whole, the http-server automatically substitutes the so-called "start page" instead of the name of the file being transferred, which is named index.htm or index.html (in some cases - default. htm or default.html). This document must be located in the root directory designated for hosting your site, or, if otherwise specified, in a directory called WWW. All other files can be placed either in the same directory or in subdirectories, which is sometimes convenient, especially when the site contains several thematic sections or headings.

In addition to the subfolders you create, in which you are free to place almost any content you need, the server directory usually contains several more directories that should be mentioned separately. First, this is the CGI-BIN folder where CGI scripts and other scripts run from your site are located. interactive applications, as well as several service directories required for normal work server. At the initial stage, you simply should not pay attention to them. Sometimes in the same directory where index.html is stored there is a row additional files: not_found.html - the document that is displayed if the http server could not find the file requested by the user, forbidden.html - displayed as an error message if access to the requested document is denied, and, finally, robots.txt - the file , which specifically describes the rules for indexing your site by search engines.

In most cases, and especially when publishing a home page on servers that provide free hosting, users are denied access to service directories and the CGI-BIN folder; changing the contents of the not_found and forbidden.html files is also impossible. This should be taken into account if you plan to include any interactive content in your resource that requires at least the ability to place files in one of the service folders. In some cases, you may be prohibited from creating nested directories on the server, then the user will have to be content with only one directory set aside for your needs.

From all that has been said, it becomes clear that the client's browser can only receive and process information from the server, and place and change it only if the uploading of files to the server is implemented based on the HTTP protocol using special CGI scripts included in the server web -interface. In all other cases, you have to use the so-called ftp-server, to which you can transfer the necessary files using special software, automatically uploading them to the directory designated for your site. In both cases, you will need to know your login name and password to access the system. It should also be remembered that most server programs (in particular, Apache for UNIX-compatible platforms) distinguish between lowercase and capital case characters, therefore all file names and their extensions should be written in lowercase letters, and always in Latin, to avoid errors. The latter is due to differences in the processing of Russian language encodings, typical for certain servers.

The work over the HTTP protocol is as follows: the client program establishes a TCP connection with the server (the standard port number is 80) and issues an HTTP request to it. The server processes this request and issues an HTTP response to the client.

Communication between the client and the Web server is done through the exchange of messages. HTTP messages are divided into client-to-server requests and server-to-client responses.

Request and response messages have a common format. Both types of messages look like this: first there is an initial line (start-line), then, possibly, one or more header fields, also called just headers, then an empty line (that is, a line consisting of the characters CR and LF), indicating the end of the header fields, and then possibly the body of the message:

start line

header field 1

header field 2

header field N

message body

HTTP protocol headers

The client and server start line formats are different and will be discussed below. There are four types of headlines:

General headers (general-headers), which can be present both in the request and in the response;

Request-headers, which can only be present in a request;

Response headers, which can only be present in a response;

Entity-headers that refer to the body of a message and describe its content.

Each title consists of a title, a colon ":" and a value. The most important headings are shown in Table 1.

Table 1

HTTP protocol headers

Heading Appointment
Object headers
Allow Lists methods supported by the server
Content-Encoding The way the message body is encoded, for example to reduce the size
Content-Length Message length in bytes
Content-Type Contains the MIME content type designation of the response. Depending on the Content-Type, the browser interprets the response as an HTML page, gif or jpeg image, a file to be saved to disk, or something else and takes appropriate action. Some types of content: text / html - HTML text (web page); text / plain - plain text (similar to "Notepad"); image / jpeg - picture in JPEG format; image / gif - the same, in GIF format; It can also pass encoding for text data. For example: charset = windows-1251 charset = koi8-rus Content-Length - length of the response content in bytes (file size). Last-Modified - date and time when the document was last modified.
ETag A unique resource tag on the server that allows you to compare resources
Expires Date and time when the resource on the server will be changed, and it must be retrieved again
Last-Modified Date and time of the last modification of the content
Response headers
Age The number of seconds after which to retry the request to get new content
Location The URI of the resource to be consulted to get the content
Retry-After Date and time or number of seconds after which the request must be repeated in order to receive a successful response
Server Name of the server software that responded
Request headers
Accept A list of browser-supported content types in order of preference for this browser, for example: Accept: image / gif, image / x-xbitmap, image / jpeg, image / pjpeg, application / vnd.ms-excel, application / msword, application / vnd. ms-powerpoint, * / * This is obviously needed for the case when the server can serve the same document in different formats. The value of this parameter is used mainly by CGI scripts to generate a response adapted for a given browser.
Accept-Charset Character encodings in which the client can accept text content
Accept-Encoding The way the server can encode the message
Host Host and port number from which the document is requested
If-Modified-Since If-Match If-None-Match If-Range If-Unmodified-Since Request headers for conditional resource access
Range Request part of a document
User-Agent Client software name - the value is the "code name" of the browser, for example: Mozilla / 4.0 (compatible; MSIE 5.0; Windows 95; DigExt)
General headers
Connection Connection - can be Keep-Alive and close. Keep-Alive means that after the issuance of this document, the connection to the server is not broken, and more requests can be issued. Most browsers work in the Keep-Alive mode, since it allows you to "download" an html page and pictures to it in one connection to the server. Once set, Keep-Alive mode is maintained until the first error or until explicitly indicated in the next Connection: close request. close - the connection is closed after responding to this request.
Date Date and time of message formation
Pragma Implementation-specific commands for the transferred content
Transfer-Encoding Message encoding method for transmission

In some headers, the value is date and time. They must be in the format described in RFC 1123, for example:

The body of the message contains the actual information being transmitted - the payload of the message. The message body is a sequence of octets (bytes). The body of the message can be encoded, with the encoding specified in the header of the Content-Encoding object.

A request message from client to server consists of a request-line, headers (general, request, object), and possibly a message body.

The request line begins with a method, followed by the requested resource identifier, protocol version, and trailing end-of-line characters:

<Метод> <Идентификатор> <Версия HTTP>

Method specifies the method to apply to the requested resource. For example, the GET method says that the client wants to get the content of the resource. The identifier identifies the requested resource. The HTTP version is indicated by a line like this:

HTTP /<версия>.<подверсия>

HTTP protocol methods

Let's look at the main methods of the HTTP protocol.

The OPTIONS method queries for information about connection options (for example, methods, document types, encodings) that the server supports for the requested resource. This method allows the client to define options and / or requirements associated with the resource, or the capabilities of the server, without taking any action on the resource or initiating a download.

If the server's response is not an error message, then the object headers contain information that can be thought of as connection options. For example, the Allow header lists all the methods supported by the server for a given resource.

If the identifier of the requested resource is an asterisk ("*"), then the OPTIONS request is intended to address the server as a whole.

If the identifier of the requested resource is not an asterisk, then the OPTIONS request applies to the options available when connecting to the specified resource.

The GET method allows you to get any information related to the requested resource. In most cases, if the ID of the requested resource points to a document (for example, Text Document, graphic image, video), then the server returns the content of this document (file content). If the requested resource is an application (program) that generates data, then the generated data is returned in the body of the response message, and not a binary image of the executable file. This is used, for example, when creating CGI applications. If the identifier of the requested resource points to a directory (directory, folder), then, depending on the server settings, either the contents of the directory (list of files) or the contents of one of the files located in this directory (usually index.html or Default.htm). In the latter case, the folder name can be specified either with the "/" symbol at the end, or without it. If this symbol is absent at the end of the identifier, the server issues one of the responses with redirection (with status codes 301 or 302).

Distinguish between "conditional GET", in which the request message includes the If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range request headers. The conditional GET method requests the transfer of an object only if it satisfies the conditions described in the given headers. The conditional GET method is designed to reduce unnecessary network load, since it allows you not to download the data already saved by the client a second time.

A distinction is also made between "partial GET", in which the request message includes a Range request header. A partial GET requests the transfer of only a part of the object. The partial GET method is designed to reduce unnecessary network load by requesting only part of the object when the other part has already been downloaded by the client. The Range header value is the range of bytes to receive. The bytes are numbered starting from 0. The starting and ending bytes of the range are separated by a "-" character. If you need to get several ranges, they are listed separated by commas.

The HEAD method is identical to GET, except that the server does not return the message body in the response. The meta information contained in the HTTP headers of the response to a HEAD request is identical to the information provided in the response to a GET request. This method can be used to get information about the request object without directly forwarding the object body. The HEAD method is often used to test hypertext links.

The POST method is used for a request, in which the addressed server receives the data included in the message body (object) of the request and sends it for processing to the application specified as the requested resource. POST is designed to provide a generic method to implement following functions:

Annotation of existing resources;

Posting a message to an electronic bulletin board (BBS), newsgroups, mailing lists, or similar group of articles;

Passing a block of data, such as the result of an input in a form, to a processing process;

Execution of queries to databases (DB);

In fact, the function performed by the POST method is determined by the application pointed to by the ID of the requested resource. Along with the GET method, the POST method is used when building CGI applications. The browser can form requests with the POST method when submitting forms. For this, the FORM element HTML document containing the form must have a METHOD attribute with a POST value.

A POST action can take an action on the server and not pass any content as a result. In this case, depending on whether the response includes a message body describing the result or not, the status code in the response can be either 200 (OK) or 204 (No Content).

If the resource was created on the server, the response contains a 201 (Created) status code and includes the Location response header.

The body of the message, which is transmitted in a request with the PUT method, is saved on the server, and the identifier of the requested resource will be the identifier of the saved document. If the identifier of the requested resource points to an already existing resource, then the object included in the body of the message is considered as modified version resource located on the server. If a new resource is created, then the server informs the user agent about it with a response with a status code 201 (Created).

The fundamental difference between the POST and PUT methods is the different meaning of the ID of the requested resource. The URI in the POST request identifies the resource that is handling the object included in the body of the message. This resource can be an application receiving data. In contrast, the URI in a PUT request identifies the object included in the request as a message body, that is, the user agent assigns the given URI to the included resource.

The DELETE method asks the server to delete a resource that has the requested identifier. A request with this method can be rejected by the server if the user does not have permission to delete the requested resource.

The TRACE method is used to return the submitted request at the HTTP protocol level. The receiver of the request (Web server) sends the received message back to the client as the body of a response object with a 200 (OK) status code. A TRACE request must not contain a message body.

TRACE allows the client to see what the server is receiving at the other end and use that data for testing or diagnostics.

If the request is successful, then the response contains the entire request message in the response message body, and the Content-Type object's header is "message / http".

Answer codes

After receiving and interpreting the request message, the server responds with an HTTP response message.

The first line of the response is the Status-Line. It consists of a version of the protocol, a numeric status code, an explanatory phrase, separated by spaces, and trailing end-of-line characters:

<Версия HTTP> <Код состояния> <Поясняющая фраза>

The protocol version has the same meaning as in the request.

The Status-Code element is an integer three-digit (three-digit) code of the result of understanding and satisfying the request. Reason-Phrase is a short textual description of the status code. The status code is for software processing and the explanatory phrase is for users.

The first digit of the status code identifies the class of the response. The last two digits have no specific role in the classification. There are 5 values ​​for the first digit:

1xx: Information codes - request received, processing continues.

2xx: Success Codes - The action was successfully received, understood, and processed.

3xx: Redirect Codes - Further action must be taken to complete the request.

4xx: Client Error Codes - The request has a syntax error or could not be completed.

5xx: Server Error Codes - The server is unable to fulfill a valid request.

Reason-Phrases for each status code are listed in RFC 2068 and are recommended, but may be substituted for equivalents without affecting the protocol. For example, in localized Russian-language versions of HTTP servers, these phrases are replaced by Russian ones. Table 2 lists the HTTP server response codes.

table 2

HTTP Server Response Codes

The code Explanatory phrase as per RFC 2068 Equivalent explanatory phrase in Russian
1xx: Information codes
Continue Continue
2xx: Success codes
OK OK
Created Created by
No Content No content
Reset Content Reset content
Partial Content Partial content
3xx: Redirect codes
Moved Temporarily Temporarily moved
Not Modified Not modified
4xx: Client error codes
Bad Request Corrupted Request
Unauthorized Unauthorized
Not found Not found
Method Not Allowed Method is not allowed
Request Timeout Request timed out
Conflict Conflict
Length Required Length required
Request Entity Too Large Request object is too large
5xx: Server error codes
Internal Server Error Internal server error
Not Implemented Not implemented
Service Unavailable Service is unavailable
HTTP Version Not Supported Unsupported HTTP version

The status bar is followed by the headers (general, response, and object) and possibly the message body.

One of essential functions web server is to provide access to a part of the local file system... To do this, a certain directory is specified in the server settings, which is the root for this server. To publish a document, that is, to make it available to users who have "visited" this server (made a connection with it via the HTTP protocol), you need to copy this document to root directory Web server or one of its subdirectories. When connecting via the HTTP protocol, a process is created on the server with user rights, which, as a rule, does not exist in reality, but is specially created to view the server's resources. Configuring rights and permissions given user you can control access to Web resources.

Let's look at the simplest example of an HTTP request. If in the address window of the browser we type the address http://yandex.ru, then the browser will determine the IP address of the server yandex.ru and send it the following HTTP request on the 80th port:

GET http://yandex.ru/ HTTP / 1.0

Accept: image / gif, image / x-xbitmap, image / jpeg, image / pjpeg, application / vnd.ms-excel, application / msword, application / vnd.ms-powerpoint, * / *

Accept-Language: ru

Cookie: yandexuid = 2464977781018373381

User-Agent: Mozilla / 4.0 (compatible; MSIE 5.5; Windows 98)
Host: yandex.ru

Referer: narod.ru

Proxy-Connection: Keep-Alive

The request is sent in unencrypted text form... The most main part The request is located in the first line: This is the request type (GET), the URL address of the requested document (http://yandex.ru) and the version of the HTTP protocol (HTTP / 1.0). The following are the request parameters. Each line corresponds to one parameter. The line starts with the name of the parameter, followed by a colon and the value of the parameter.

Accept is the type of data the browser can accept (MIME encoded).

Accept-Language is the preferred language in which the browser wants to accept data. User-Agent - the type of program that sent the request.

Host - DNS (or IP) name of the host to which the request is addressed.

Cookie - cookies (data that was saved by the server on the client's local disk when visiting this host the last time).

Referer - the host from whose page we are sending the request. So, for example, if we are on the page http://narod.ru, and click the link http: //yandex.ru there, then the request will be sent to the host yandex.ru, and the referer request field will contain the hostname of narod.ru.

The set of query parameters is not fixed. In addition to the above, there may be other parameters.

The most interesting parameters are referer and cookie. These parameters are mainly used to authenticate the user to the server.

GET request may contain data transmitted by the client to the server. They are passed directly through the URL using the CGI protocol. The data is separated from the URL by a “?” and are connected with the “&” sign:

GET ?<параметр 1>=<значение 1>&<параметр 2>=<значение 2>&…

This type of data transfer to the server is convenient, but it has limitations on the volume. Too large amounts of data cannot be transferred via the URL. For such purposes, there is another type of request: a POST request. A POST request is very similar to a GET request, with the only difference that the data in the POST request is transmitted separately from the request header itself:

The body of the request must be separated from the header by an empty line. If the server encounters an empty string in a POST request, then everything that follows it considers the body of the request (transmitted data). Note the following: the format of the data in the body of the POST request is arbitrary. Although the CGI format is most commonly used, it is not required. In addition, a POST request does not require a request body, and can also transfer data via a URL.

In addition to the CGI format, sometimes the so-called. multipart format (the format of the transmitted data is determined by the Content-Type parameter):

Modern browsers contain tools for web developers to get some information about post requests being sent. If you need to look at the headers of just a couple of requests, using them will be easier and faster than other methods.

If you are using Firefox, you can use its web console. It displays the request headers and the content of the transmitted cookies... To launch it, open the browser menu, click on the "Web Development" item and select "Web Console". In the panel that appears, activate the "Network" button. Enter the name of the method - post in the filter field. Depending on your goals, click on the button of the form submitting required request or refresh the page. The console displays the submitted request. Click on it with the mouse to see more details.

Google Chrome browser has powerful tools debugging. To use them, click on the icon with the image of a wrench, and then open the item "Settings and google management Chrome ". Select "Tools" and launch "Developer Tools". In the toolbar, select the Network tab and submit your request. Find the required request in the list and click on it to study the details.

V Opera browser there are built-in developer tools for Opera Dragonfly. To launch them, right-click on the desired page and select the item context menu"Inspect Element". Go to the Developer Tools Network tab and submit your request. Find it in the list and expand it to examine the server headers and responses.

Internet Explorer 9 contains a kit called F12 Developer Tools that provides detailed information on fulfilled requests. They are started by pressing the F12 button or using the "Service" menu containing the item of the same name. To view the request, go to the "Network" tab. Find the given query in the summary and double-click to expand the details.

Chrome browsers and Internet Explorer 9 contain built-in tools that allow you to examine a submitted post request in full detail. For full details use them or Firefox with installed plugin Firebug. It is very handy for frequently examining queries, for example, when debugging sites.

If you want to see a request sent by a program other than a browser, use the Fiddler HTTP debugger. It works as a proxy server and intercepts requests from any program, and also provides very detailed information on their headers and content.

URI (Uniform Resource Identifier) is a unified (uniform) resource identifier. URI is a character string that allows you to identify any resource: document, image, file, service, e-mail box, etc. First of all, we are talking, of course, about the resources of the Internet and the World Wide Web. A URI provides a simple and extensible way to identify resources. URI extensibility means that several identification schemes already exist within a URI, and more will be created in the future.

Relationship between URI, URL and URN

Venn diagram showing subsets of the URI scheme: URL and URN.

The URI is either a URL, a URN, or both.

  • A URL is a URI that, in addition to identifying a resource, also provides information about the location of that resource.
  • A URN is a URI that only identifies a resource in a specific namespace (respectively, in a specific context), but does not indicate its location. For instance, URN urn: ISBN: 0-395-36341-1 is a URI that points to a resource (book) 0-395-36341-1 in the ISBN namespace, but, unlike a URL, the URN does not indicate the location of this resource: in it it is not said in which store it can be bought or on which website to download it.

Since the URI does not always indicate how to obtain a resource, unlike a URL, but only identifies it, this makes it possible to describe resources using RDF (Resource Description Framework) that cannot be obtained via the Internet (for example, a person, a car, city, etc.).

Story

In 1990, in Geneva, Switzerland, within the walls of the European Council for Nuclear Research, British scientist Tim Berners-Lee invented the resource location locator URL. Since URL is the most commonly used subset of URIs, 1990 is considered to be the year of birth of the URI. But, strictly speaking, the concept of URI was only documented in June 1994 in RFC 1630.

The new version of the URI was defined in 1998 in RFC 2396, at the same time the word Universal in the title has been changed to Uniform.

Flaws

The URL was a fundamental innovation on the Internet, so the principles of URI were documented to ensure full URL compatibility. This is where the big disadvantage of URIs comes from, inheriting from URLs. In a URI, as in a URL, only a limited set of Latin characters and punctuation marks (even less than in ASCII) can be used. In other words, if we want to use Cyrillic characters, or hieroglyphs, or, say, specific characters of the French language, in the URI, we will have to encode the URI in the same way that Wikipedia encodes URLs with Unicode characters. For example, a line like this:

https://ru.wikipedia.org/wiki/Cyrillic

URL encoded as:

https://ru.wikipedia.org/wiki/%D0%9A%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0

Since the letters of all alphabets are subjected to such a transformation, except for the one used in English language Latin letters, then URIs with words in other languages ​​(even European ones) lose their ability to be perceived by people. And this is in gross contradiction with the principle of internationalism, proclaimed by all the leading organizations of the Internet, including the W3C and ISOC. This problem is intended to be solved by the IRI standard (eng. Internationalized Resource Identifier) - international resource identifiers in which it would be possible to use Unicode characters without problems, and which would not infringe upon the rights of other languages. Also, the creator of the URI, Tim Berners-Lee, said that the domain name system that underlies URLs is a bad decision, forcing resources on a hierarchical architecture that is not suitable for the hypertext web.

URI structure

URI = [scheme ":"] hierarchical - part [ "?" request] ["#" snippet]

In this entry:

Scheme

scheme for accessing a resource (often indicates a network protocol), for example, http, ftp, file, ldap, mailto, urn

Hierarchical-part

contains data, usually organized in a hierarchical form, which, when combined with data in a non-hierarchical component inquiry, serve to identify the resource within the scope of the URI scheme. Usually hierarchical-part contains the path to the resource (and, possibly, in front of it, the address of the server on which it is located) or the resource identifier (in the case of URN).

Inquiry

this optional URI component is described above.

Fragment

(also an optional component)

Allows you to indirectly identify a secondary resource by referencing the primary resource and specifying additional information. An identifiable secondary resource can be some part or subset of the primary, some representation of it, or another resource defined or described by such a resource.

Parsing the structure of the URI. For the so-called "parsing" of URIs (eng. parsing), that is, to decompose URIs into their constituent parts and their subsequent identification, it is most convenient to use the regular expression system, which is now available in almost all modern programming languages. The following pattern is recommended for parsing URIs in RFC 3986:

This pattern includes 9 groups indicated above by numbers (for more information on patterns and groups, see Regular Expressions), which most fully and accurately parse a typical URI structure, where:

  • group 2 - scheme,
  • group 4 - source,
  • group 5 - path,
  • group 7 - request,
  • group 9 - fragment.

Thus, if using of this template parse, for example, a typical URI like this:

http://www.ics.uci.edu/pub/ietf/uri/#Related

then the 9 above template groups will give the following results respectively:

  1. http:
  2. //www.ics.uci.edu
  3. www.ics.uci.edu
  4. / pub / ietf / uri /
  5. no result
  6. no result
  7. #Related
  8. Related

Examples of URIs:

Absolute URIs

  • https://ru.wikipedia.org/wiki/URI
  • ftp://ftp.is.co.za/rfc/rfc1808.txt
  • file: // C: \ UserName.HostName \ Projects \ Wikipedia_Articles \ URI.xml
  • file: /// C: /file.wsdl
  • file: ///Users/John/Documents/Projects/Web/MyWebsite/about.html
  • ldap: /// c = GB? objectClass? one
  • mailto: [email protected]
  • sip: [email protected]
  • news: comp.infosystems.www.servers.unix
  • data: text / plain; charset = iso-8859-7,% be% be% be
  • tel: + 1-816-555-1212
  • telnet: //192.0.2.16: 80 /
  • urn: oasis: names: specification: docbook: dtd: xml: 4.1.2

2) Relative URIs

  • /relative/URI/with/absolute/path/to/resource.txt
  • //example.org/scheme-relative/URI/with/absolute/path/to/resource.txt
  • relative / path / to / resource.txt
  • ../../../resource.txt
  • resource.txt
  • /resource.txt#frag01
  • # frag01

[empty string] - is equivalent to parsing the identifier by the parser with the result [empty string], that is, the link leads to the default object in the default schema

DNS service

DNS stands for Domain Name System. DNS domain names are synonyms for IP addresses, just like the names in your phone's address book are synonyms phone numbers... They are symbolic, not numeric; they are more convenient for memorization and orientation; they carry a semantic load. www.irnet.ru → DNS tables → 193.232.70.36 Domain names are also unique, i.e. there are no two identical domain names in the world. Domain names, unlike IP addresses, are optional, they are purchased additionally.

Rice. 2. Hierarchy in the DNS.

The addresses that are indicated on envelopes when delivering letters are also unique. by regular mail... There are no countries in the world with the same names. And if the names of cities are sometimes repeated, then in combination with the division into larger administrative units such as districts and regions, they become unique. And street names should not be repeated within the same city. Thus, the address, based on geographical and administrative names, uniquely identifies the destination. Domains have a similar hierarchy. Domain names are separated from each other by periods: lingvo.yandex.ru, krkime.com.

DNS has the following characteristics:

  • Distributed administration... Different people or organizations are responsible for different parts of the hierarchy.
  • Distribution of information storage... Each node of the network must necessarily store only the data that is included in its area of ​​responsibility, and (possibly) addresses root DNS servers.
  • Information caching... Knot maybe store some data outside of their area of ​​responsibility to reduce the load on the network.
  • Hierarchical structure , in which all the nodes are combined into a tree, and each node can either independently determine the work of the lower-level nodes, or delegate(transfer) them to other nodes.
  • Reservation... For the storage and maintenance of their nodes (zones) are (usually) several servers, separated both physically and logically, which ensures the safety of data and the continuation of work even in the event of a failure of one of the nodes.

Domain levels. There are three levels of domains.

Domains first or top level are divided into two groups:

1) These are domains with territorial affiliation, for example: .ru .by .ua .de .us, etc. That is, these are domains that are assigned to a particular country. By them, you can, for example, determine which country a particular site belongs to.

2) The second group of first-level domains are domains of some specific purpose. For example: .com - for commercial organizations, .info - for informational sites, .tv - for television companies, etc. These domains can be used to determine the specific focus of the site. Although, in truth, in Lately they are more and more used for whatever they want and often do not stick to their purpose.

Domains of the first level cannot be used as the address of your site. They serve to create domains second level , therefore, on any of the first-level domains, you can register a second-level domain. Second level domain consists of the following elements: www.site_name.first level domain. For example: www.webmastermix.ru. It is recommended to use second-level domain names for the site address. They are best read and remembered by people, as well as perceived search engines... Therefore, most sites have domain names at this level.

In addition, there are domains third level ... They are created based on second-level domains. The third-level domain looks like this: www.forum.webmastermix.ru. Having registered a second-level domain, you can independently create on its basis as many third-level domains as you like. You can register a domain name for your site using special services.

WEB TECHNOLOGIES: HTML, JAVASCRIPT

The first part of the didactic block of the above topic was devoted to Internet technologies. Now we are starting to study the technologies used in the World Wide Web, or web technologies.

First, you need to understand the basic concepts of web technologies: website and web page. A web page is the minimum logical unit of the World Wide Web, which is a document that is uniquely identified by a unique URL. A website is a collection of thematically related web pages located on the same server and owned by the same owner. In a particular case, a website can be represented by one single web page. The World Wide Web is the collection of all websites.

The basis of the entire World Wide Web is the hypertext markup language HTML - Hyper Text Markup Language (Fig. 3). It serves for logical (semantic) markup of a document (web page). Sometimes it is improperly used to control the way the content of web pages is displayed on a monitor screen or when outputting to a printer, which fundamentally contradicts the ideology adopted on the World Wide Web.

Rice. 3. Web technologies

Cascading Style Sheets (CSS) are intended to control the display of content on web pages. CSS is similar in many ways to the styles used in the popular word processor Word.

Scripting languages ​​are used to add dynamism to web pages (drop-down menus, animation). The standard scripting language on the world wide web is JavaScript. The core of JavaScript is ECMAScript.

HTML, CSS, JavaScript are languages ​​with which you can create any complex website. But this is just linguistic support, while in browsers documents are represented as a collection of objects, many of which are the browser object model (BOM). The browser object model is unique to each model, and thus problems arise when building cross-browser applications. Therefore, the Web Consortium proposed the Document Object Model (DOM), which is the standard way to represent web pages using a collection of objects.

The syntax of modern HTML is described using the Extensible Markup Language. XML will allow you to create your own markup languages ​​similar to HTML in the form of DTDs. There are many such languages: for representing mathematical and chemical formulas, knowledge, etc.

As you can see from the above, all web technologies are closely interconnected. Understanding this fact will make it easier to understand the purpose of a particular mechanism used to create web applications.

EMAIL

Electronic mail (email, e-mail, from the English electronic mail) is a technology and the services it provides for sending and receiving electronic messages (called "letters" or " emails") Distributed computer network... The main difference from other messaging systems is the possibility of delayed delivery and a developed system of interaction between independent mail servers.

E-mail makes it possible to send and receive messages, respond to correspondents' letters automatically using their addresses, send copies of the letter to several recipients at once, forward the received letter to another address, use logical names instead of addresses (numeric or domain names), create several subsections of the mailbox for all kinds of correspondence, include in letters text files, use the system of "mail bouncers" to conduct discussions with a group of your correspondents, and so on. To send a postal message by e-mail, it is necessary to indicate the address of the mailbox. An email subscriber's mailbox is an area on the hard drive mail server reserved for the user.

The development of Internet technology has led to the emergence of modern messaging protocols that provide great opportunities for processing letters, a variety of services and ease of use. For example, SMTP protocol, working on the client-server principle, is designed to send messages from a computer to the addressee. Usually access to SMTP server is not password protected, so any known server on the network can be used to send emails. Unlike servers for sending letters, access to servers for storing messages is password protected. Therefore it is necessary to use the server or service in which the Account... These servers use the POP and IMAP protocols, which differ in the way they store messages.

In accordance with the POP3 protocol, messages arriving at a specific address are stored on the server until they are downloaded to the computer during the next session. After downloading messages, you can disconnect from the network and start reading mail. Thus, using POP3 mail is the fastest and most convenient to use.

The IMAP protocol is convenient for those people who use a permanent connection to the network. Messages received by the address are also stored on the server, but, unlike POP3, when checking mail, only the message headers will be downloaded first. The letter itself can be read after selecting the message header (it will be downloaded from the server). It is clear that with a dial-up connection, working with mail using this protocol leads to unnecessary loss of time.

There are several protocols for receiving and transferring mail between multi-user systems.

A brief description of some of them:

1) SMTP (Simple Mail Transfer Protocol) is a network protocol designed for the transmission of e-mail in TCP / IP networks, and the transmission must necessarily be initiated by the transmitting system itself.

MTA (Mail Transfer Agent) - the mail transfer agent - is the main component of the Internet mail transfer system, which represents this network computer for the network e-mail system. Usually, users do not work with MTA, but with the MUA (Mail User Agent) - the email client. The principle of interaction is schematically shown in the figure.

2) POP, POP2, POP3 (Post Office Protocol)- three fairly simple non-interchangeable protocols, developed to deliver mail to a user from a central mail server, delete it from it, and to identify a user by name / password. POP includes SMTP, which is used to transfer mail from a user. Mail messages can be received in the form of headers, without receiving the entire message.

After the connection is established, the POP3 protocol goes through three consecutive states

      1. Authorization the client goes through the authentication procedure
      2. The client transaction receives information about the state of the mailbox, accepts and deletes mail.
      3. Updating the server deletes the selected emails and closes the connection.

3) IMAP2, IMAP2bis, IMAP3, IMAP4, IMAP4rev1 (Internet Message Access Protocol) - provides the user with rich opportunities for working with mailboxes located on a central server

o IMAP stores mail on the server in file directories, and also provides the client with the ability to search for strings in mail messages on the server itself.

o IMAP2 - used in rare cases.

o IMAP3 - incompatible solution, not used.

o IMAP2bis - an extension of IMAP2, allows servers to parse messages into MIME-structure (Multipurpose Internet Mail Extensions), still in use.

o IMAP4 is a reworked and enhanced IMAP2bis that can be used anywhere.

o IMAP4rev1 - Extends IMAP with a wide range of features, including those used by DMSP (Distributed Mail System for Personal Computers).

4) ACAP (Application Configuration Access Protocol) - a protocol developed to work with IMAP4; adds the ability to search subscription and subscription to message boards, mailboxes and is used to search for address books.

5) DMSP (or PCMAIL) is a protocol for receiving / sending mail, the peculiarity of which is that the user can have more than one workstation in his use. The workstation contains status information about mail, the directory through which the exchange takes place, which, when connected to the server, is updated to the current state on the mail server.

6) MIME is a standard that defines mechanisms for sending all kinds of information via email, including text in languages ​​other than English, for which character encodings other than ASCII are used, as well as 8-bit binary content such as pictures, music, films and programs.

Independent work.

Execute the example given in the text (handout) Save to own folder on your desktop.

9.2. Working with a teacher:

In case of difficulties or erroneous actions, contact the teacher to correct errors.

By the end of the lesson, show the teacher a report on the work performed and get a credit for this work.

9.3. Control of the initial and final level of knowledge:

Testing on a computer .


Similar information.


Working with URI

Every day we use Uniform Resource Identifiers (URIs) when looking for something on the WWW. URIs are needed to identify and request a new kind of resource. Using URIs, you can access not only Web pages, but also the FTP server, Web service, and local files.

The term is often used instead of URI Uniform Resource Locator (URL)... URI is a general term used for links to resources. URL is the URI associated with such popular URI schemes like http, ftp and mailto. In technical documentation, the term URL is no longer used.

Another term may already be known to you - Uniform Resource Name (URN)... URN is a standardized URI used to identify a resource regardless of its location on the network.

Let's analyze the parts of the URI that links to a page on the Global Knowledge website:

http://www.globalknowledge.net:80/training/generic.asp?pageid=1078&country=DACH

    The first part of the URI is called scheme... The schema defines the namespace of the URI and can narrow the syntax of the expression following the schema. Many schemes are named for the respective protocols (like http, ftp) they use, but this is optional. In our example, the schema identifier is http. Circuit limiter(// in this example) separates the schema from the rest of the URL.

    The schema delimiter is followed by the server name or IP address in dotted decimal notation, such as www.globalknowledge.net.

    Behind the server name or IP address is the port number that defines the connection to specific application on server. If no port number is specified, the default port number for that protocol is used (for example, port 80 for HTTP).

    Path defines the page (and directory) of the requested resource. It does not necessarily represent a physical file on the server, but it can be generated dynamically. In this case, the path looks like /training/generic.asp.

    From the path by the symbol? stripped off the last part of this URI called query... In our example, the request is defined by the line pageid = 1078 & country = DACH. A query string can have several components, each of which specifies a variable and value, concatenated with the & symbol. Multiple query components can be combined with &. So, in our example, the first component is pageid = 1078 with the pageid variable and a value of 1078, and the second component is country = DACH.

    Sections within a resource can be identified with fragments. Fragments are used to link to sections within an HTML page. In Web design, fragments are also called bookmarks. The # character separates the fragment identifier from the path. In the URL http; // www.microsoft.com/net/basics/glossary.asp#NETFramework, the snippet is #NETFramework.

If the # character is added to the query string, then it is no longer a fragment. The URL may contain a query string or a fragment, but not both.

Multiple characters are reserved in URIs — they cannot be included in hostnames or pathnames as they are special delimiter characters. The following characters are reserved in the URI:

; / ? : @ & = + $ ,

Uri class from the System namespace encapsulates a uniform resource identifier. It contains properties and methods for parsing, comparing and combining URIs.

You can create a Uri object by passing a URI string to the constructor:

Uri baseURI = new Uri ("http: // site");

If you already have a base Uri object, you can create a new URI by combining the base URI with a relative URI:

Uri baseURI = new Uri ("http: // site"); Uri newURI = new Uri (baseURI, "my / csharp / web / level2 / 2_2.php");

If the base URI already contains a path, it is ignored. Only the scheme, port and server name are taken as a base for the new URI.

The Uri class has several read-only static fields to get some of the common schemas:

Uri.UriSchemeFile

The file scheme is used to access files locally or on shared network resources, which can be named according to the universal naming convention ( Universal Naming Convention, UNC).

Uri.UriSChemeFtp

FTP with the ftp scheme is used to retrieve files from an ftp server and, conversely, put files on an ftp server.

Uri.UriSchemeGopher

The gopher protocol was the predecessor to HTTP. It provided hierarchical viewing capabilities for textual content information, which was superior to FTP. But it was soon superseded by the HTTP protocol.

Uri.UriSchemeHttp, Uri.UriSchemeHttps

These two schemes are well known: http and https. The https scheme is used for secure exchange.

Uri.UriSchemeMailto

The mailto scheme is used to send mail messages.

Uri.UriSchemeNews, Uri.UriSchemeNntp

The news and nntp schemas are used in newsgroups using the NNTP protocol.

The Uri class has static methods to check if the schema and hostname are correct: Uri.CheckSchemeName () returns true if the schema name is correct and the method UriCheckHostName () not only validates the hostname, but also returns a UriHostNameType enumeration value indicating the type of host.

The Uri class has many read-only properties that allow you to access all parts of a URI. In the following table, we use the above URI as an example to demonstrate the use of properties:

AbsoluteUri This property shows the complete URI. If the specified port number for the protocol is equal to the default port number, the Uri constructor automatically removes it. For our example, the value of the AbsoluteUri property looks like this: http://www.globalknowledge.net/t raining / generic.asp? pageid = 1078 & country = DACH... If you pass a file name to the Uri class constructor, the AbsoluteUri property automatically precedes the file name with the file: // schema.
Scheme The schema is the first part of the URI, and in this case, this property returns the value http.
Host Host property shows hostname from URI: www.globalknowledge.net
Authority If the port number is equal to the protocol's default, the Authority property displays the same string as the Host property. If a different port number is used, then the Authority property also shows the port number.
HostNameType The type of hostname depends on the name used. In this case, the same value of the UriHostNameType enumeration that was discussed above is obtained.
Port Using the Port property, the port number is obtained - 80.
AbsolutePath An absolute path starts after the port number in the URI and ends before the query string. In this case, it is set to /training/generic.asp.
LocalPath The local path gives the value /training/generic.asp. As you can see, for HTTP request there is no difference between AbsolutePath and LocalPath. The difference appears when the URI refers to a shared network resource. For URIs of the form file: \\ server \ share \ directory \ file.txt, the LocalPath property returns only the directory and file names, while the AbsolutePath property includes the server and share names.
Query The Query property shows the line following the path:? Pageid = 1078 & country = DACH.
PathAndQuery The PathAndQuery property gives a combination of path and query string: /training/generic.asp?pageid=1078&country=DACH.
Fragment If the path is followed by a fragment, it is returned in the Fragment property. The path can only be followed by a query string or fragment. The fragment is identified by the # character
Segments The Segments property returns an array of strings formed from the path. In this case, we have three segments: /, training / and generic.asp.
UserInfo The username set in the URI can be read from the UserInfo property. Passing usernames is common in FTP, and if a non-anonymous user is specified, such as ftp: // [email protected] then the UserInfo property will return myuser.

In addition to those listed, there are several more properties that return boolean values, if the URI represents a file, UNC path, address feedback or if the default port number is used for this protocol. These are the IsFile, IsUnc, IsLoopback, and IsDefaultPort properties, respectively.

Top related articles