Your browser may not support the functionality in this article.
ForEach.call(document.querySelectorAll("header .date a"), function(elem, i) ( elem.textContent += " (" + ["author","editor"][i] + ")"; ) );
Foreword
This detailed guide on the inner workings of WebKit and Gecko systems was the result of extensive research by Israeli web programmer Tali Garciel. She has been tracking all the published information about how browsers work (see section ) for several years, and has devoted a lot of time to analyzing their source code. Here is what Tali herself writes:
When IE was installed on 90% of computers, you had to put up with the fact that it was a mysterious "black box", but now that more than half of the users choose open source browsers, it's time to figure out what is hiding inside them, in millions of lines program code in C++...Tali published the results of the study on her website, but we believe they deserve the attention of a wider audience, so we post them here with some abbreviations.
A web developer who is familiar with the inner workings of browsers makes better decisions and understands why one or another should be chosen. This is quite a lengthy document, but we encourage you to read it as carefully as possible and guarantee that you won't regret it. Paul Irish, Chrome Developer Relations
Introduction
Web browsers are perhaps the most common applications. In this tutorial, I explain how they work. We will take a detailed look at what happens from the moment you type google.ru in the address bar until Google Pages on the screen.
What browsers will we consider
Today there are five main browsers: Internet Explorer, Firefox, Safari, Chrome and Opera. The examples use open source browsers: Firefox, Chrome and Safari (the code is partially open). As of August 2011, Firefox, Safari, and Chrome browsers were installed on a total of 60% of devices, according to browser usage statistics at StatCounter. Thus, open source browsers have a very strong position today.
Basic browser features
The main purpose of a browser is to display web resources. To do this, a request is sent to the server, and the result is displayed in the browser window. Resources are mostly HTML documents, but they can also be PDF files, images, or other content. The location of a resource is determined using a URI (Uniform Resource Identifier).
The way the browser processes and displays HTML files is defined by the HTML and CSS specifications. They are developed by the Consortium W3C, which implements standards for the Internet.
For many years, browsers met only part of the specifications, and separate extensions were created for them. For web developers, this meant serious compatibility issues. Most browsers today meet all of the specifications to a greater or lesser extent.
User interfaces different browsers have a lot in common. The main elements of the browser interface are listed below.
- Address bar for entering a URI
- Navigation buttons "Back" and "Forward"
- Bookmarks
- Refresh and stop page loading buttons
- Home button to go to the main page
Oddly enough, the specification that would define the standards user interface browser does not exist. Modern interfaces are the result of many years of evolution, as well as the fact that developers partially copy each other. The HTML5 spec doesn't specify exactly what the browser interface should contain, but it lists some of the core elements. These include the address bar, status bar, and toolbar. Of course, there are also specific features, such as the download manager in Firefox.
Top level structure
The main components of the browser () are listed below.
- User interface– includes address bar, back and forward buttons, bookmarks menu, etc. It includes all elements except the window in which the requested page is displayed.
- Browser engine– controls the interaction between the interface and the display module.
- Display module- Responsible for displaying the requested content on the screen. For example, if an HTML document is requested, the renderer parses the HTML and CSS code and displays the result on the screen.
- Network components – designed to make network calls such as HTTP requests. Their interface does not depend on the type of platform, each of which has its own implementations.
- Executive part of the user interface- used to draw basic widgets such as windows and combo boxes. Its generic interface is also independent of the platform type. The execution part always applies the methods of the user interface of a particular operating system.
- JavaScript interpreter– used to parse and execute JavaScript code.
- Data store- necessary for the persistence of processes. Browser saves to HDD data various types, for example cookies. The new HTML specification (HTML5) defines the term "web database": it is a complete (albeit lightweight) browser-based database.
It should be noted that Chrome, unlike most browsers, uses multiple instances of the display module, one in each tab, which are separate processes.
Display module
As the name suggests, the display module is responsible for displaying the requested content on the browser screen.
By default, it is capable of displaying HTML and XML documents, as well as pictures. Special plug-ins (browser extensions) make possible display other content such as PDF files. However, this chapter focuses on the main features: displaying HTML documents and images formatted with CSS styles.
Display modules
The browsers we are interested in (Firefox, Chrome and Safari) use two renderers. Firefox uses Gecko - own development Mozilla, while Safari and Chrome use WebKit.
WebKit is an open source display module that was originally developed for Linux platforms and adapted by Apple for Mac OS and Windows. Details can be found at webkit.org.
Basic scheme of work
The display module receives the content of the requested document by protocol network layer, usually in chunks of 8 KB.
Scheme further work The display module looks like below.
Picture . Scheme of the display module.The renderer parses the HTML document and translates the tags into nodes in the content tree. Style information is retrieved from both external CSS files and style elements. This information and display instructions in the HTML file are used to create another tree - .
It contains rectangles with visual attributes such as color and size. The rectangles are arranged in the order in which they should be displayed on the screen.
Once the display tree is created, elements begin, during which each node is assigned the coordinates of the point on the screen where it should appear. Then , is executed, in which the nodes of the display tree are sequentially drawn using the frontend of the user interface.
It is important to understand that this is a sequential process. For the convenience of the user, the renderer tries to display the content as soon as possible, so the rendering tree and layout can start before the HTML code is parsed. Some parts of the document are parsed and displayed, while others are only transmitted over the network.
Work examples
Picture . Schematic diagram of the WebKit rendering module. Picture . The scheme of the Mozilla Gecko display module ().As you can see from Figures 3 and 4, WebKit and Gecko use different terminology, but the way they work is almost identical.
In Gecko, a tree of visually formatted elements is called a frame tree, in which each element is a frame. WebKit uses a render tree consisting of render objects. The placement of elements in WebKit is called layout, or layout (layout), and in Gecko - wrapping (reflow). Union DOM nodes and visual attributes to create a display tree is called an attachment in WebKit. A small difference in Gecko that has nothing to do with semantics is that there is one more layer between the HTML file and the DOM tree. It is called the content sink and is used to form DOM elements. Now let's talk about each stage of work in more detail.
Parsing: general information
Since parsing is milestone operation of the display module, let's consider it in more detail. Let's start with a brief introduction.
Parsing a document means converting it into a readable and executable structure. The result of parsing is typically a tree of nodes representing the structure of the document. It is called a parse tree, or simply a parse tree.
For example, parsing the expression 2 + 3 - 1 might result in the following tree:
Picture . Tree node for mathematical expression.Grammar
parsing works on the basis certain rules, which are determined by the language (format) of the document. For each format, there are grammatical rules, consisting of vocabulary and syntax. They form the so-called. . Natural languages do not follow the rules of context-free grammar, so standard parsing techniques are not suitable for them.
Syntactic and lexical analyzers
Along with syntactic analysis, lexical analysis is used.
Lexical analysis is the division of information into tokens, or lexemes. Tokens form the vocabulary of a particular language and are the building blocks for creating documents. In a natural language, tokens would be all the words that can be found in dictionaries.
The point of parsing is to apply the syntax rules of the language.
Document parsing is usually performed by two components: lexical analyzer, which parses the input character sequence into real tokens, and parser, parsing the structure of the document according to the syntax rules given language and forming a syntax tree. The parser ignores non-informative characters such as spaces and line breaks.
Picture . Transition from source document to syntax tree.Parsing is an iterative process. The parser usually asks the lexical for a new token and checks whether it matches one of the syntax rules. If a match can be made, the token is created new node in the syntax tree, and the parser asks for the next token.
If the token doesn't match any rule, the parser defers it and asks for more tokens. This continues until a rule is found that all pending tokens would match. If such a rule cannot be found, the parser throws an exception. This means that the document contains syntax errors and cannot be processed completely.
Translation
The syntax tree is not always the final result. Parsing is often used in the process of translating an input document into desired format. Compilation is an example. Compiler that translates source into a machine code, first parses it and forms a syntax tree, and only then creates a document with machine code based on this tree.
Picture . compilation steps.Parsing example
Figure 5 shows a syntax tree built on the basis of a mathematical expression. Let's define an elementary mathematical language and consider the parsing process.
Vocabulary: Our language can contain integers, plus and minus signs.
Syntax
- The structural elements of the language are expressions, operands and operators.
- A language can contain any number of expressions.
- An expression is a sequence consisting of an operand, an operator, and another operand.
- An operator is a plus or minus token.
- An operand is either an integer token or an expression.
Consider the input character sequence 2 + 3 - 1 .
The first element that matches the rule is 2 (according to rule #5, this is the operand). The second such element is 2 + 3 (the sequence consisting of an operand, an operator, and another operand is defined by rule #3). We will find the next correspondence at the very end: the sequence 2 + 3 - 1 is an expression. Since 2+3 is an operand, we get a sequence consisting of an operand, an operator, and another operand, which matches the definition of an expression. Line 2++ does not match the rules, so would be considered invalid.
Formal definition of vocabulary and syntax
The language from the example above could be defined like this:
INTEGER:0|* PLUS: + MINUS: - As you can see, integers are defined by a regular expression.
The syntax is usually described in BNF format. The language from the example above can be described as follows:
Expression:= term operation term operation:= PLUS | MINUS term:= INTEGER | expression
As already mentioned, a language can be processed using standard parsers if its grammar contextless, that is, it can be fully expressed in BNF format. Formal definition context-free grammar can be found in this Wikipedia article.
Types of parsers
There are two types of parsers: top-down and bottom-up. The former perform the analysis from the top down, and the latter from the bottom up. Downstream parsers parse the structure top level and look for matches to syntactic rules. The bottom-up parsers first process the input sequence of characters and gradually reveal the syntactic rules in it, starting with the rules of the lower and ending with the rules of the upper level.
Now let's see how these two types of parsers would handle our example.
The top-down parser would start with the top-level rule and determine that 2 + 3 is an expression. It would then determine that 2 + 3 - 1 is also an expression (in the process of defining expressions, matches to other rules are also found, but the top-level rule is always considered first).
The upstream parser would process the character sequence until it found appropriate rule, which can be used to replace the detected fragment, and so on until the end of the sequence. Expressions with a partial match are placed on the parser stack.
When running such a parser, the input sequence of characters is shifted to the right (think of a cursor placed at the beginning of the sequence and shifted to the right during parsing) and gradually reduced to syntactic rules.Automatic generation of parsers
Exist special applications to create parsers called generators. It is enough to load the grammar of the language (vocabulary and syntactic rules) into the generator, and it will automatically create the analyzer. Creating a parser requires a deep understanding of how it works, and doing it by hand is not so easy, so generators can be quite useful.
DOM
The resulting syntax tree consists of DOM elements and attribute nodes. DOM- object model document (Document Object Model) - serves to represent an HTML document and interface HTML elements external objects such as JavaScript code.
At the root of the tree is a Document object.
DOM Model almost identical to markup. Consider an example markup:
By "the tree contains DOM nodes" is meant that the tree consists of elements that implement one of the DOM interfaces. Browsers use specific implementations that have additional attributes for internal use.
Parsing algorithm
As discussed in the previous sections, HTML code cannot be parsed using standard downstream or downstream parsers.
The reasons for this are listed below.
- The language has a "sparing" character.
- Browsers have built-in mechanisms to handle some common errors in HTML code.
- The parsing cycle is characterized by the possibility of re-entry. The source document usually does not change during parsing, however, in the case of HTML, script tags containing document.write can add new tokens, so the source code can change.
Since standard parsers are not suitable for HTML, browsers create their own parsers.
The parsing algorithm is detailed in the HTML5 specification. It consists of two stages: lexical analysis and tree construction.
During lexical analysis, the input sequence of characters is divided into tokens. HTML tokens include start and end tags, as well as attribute names and values.
The lexical analyzer finds a token, passes it to the tree constructor, and moves on to the next character, looking for more tokens, until the end of the input sequence.
Picture . Steps in parsing HTML code (source: HTML5 specification).Lexical analysis algorithm
The result of the algorithm is an HTML token. The algorithm is expressed as an automaton with a finite number of states. In each state, one or more characters of the input sequence are processed, on the basis of which the next state is determined. It depends on the stage of lexical analysis and the stage of tree formation, that is, processing the same character can lead to different results (different states) depending on the current state. The algorithm is complex enough to be described in detail here, so let's look at a simplified example that will help us better understand how it works.
Let's lexically analyze a simple HTML code:
hello world
The initial state is "data". When the parser encounters a character< , состояние меняется на "open tag". If a letter (a-z) is next encountered, a start tag token is generated and the state changes to "tag name". It persists until the > character is encountered. Symbols are added one by one to the name of the new token. In our case, the html token is obtained.
When the > symbol is found, the token is considered ready and the analyzer returns to the state "data". Tag
processed in exactly the same way. So the parser has already generated the html and body tags and returned to the state "data". Finding the letter H in the phrase Hello world leads to the generation of a character token. The same happens with the rest of the letters, until the analyzer reaches the symbol< в теге . For each character of the Hello world phrase, a token is created.The analyzer then returns to the state "open tag". Detection of the character / leads to the creation of a closing tag token and a transition to the state "tag name". It persists until the > character is encountered. At this moment, a new tag token is generated, and the analyzer returns to the state "data". Character sequence