How to set up smartphones and PCs. Informational portal
  • home
  • Security
  • Browsers are. Chrome: an investment in quality

Browsers are. Chrome: an investment in quality

Your browser may not support the functionality in this article.

ForEach.call(document.querySelectorAll("header .date a"), function(elem, i) ( elem.textContent += " (" + ["author","editor"][i] + ")"; ) );

Foreword

This detailed guide on the inner workings of WebKit and Gecko systems was the result of extensive research by Israeli web programmer Tali Garciel. She has been tracking all the published information about how browsers work (see section ) for several years, and has devoted a lot of time to analyzing their source code. Here is what Tali herself writes:

When IE was installed on 90% of computers, you had to put up with the fact that it was a mysterious "black box", but now that more than half of the users choose open source browsers, it's time to figure out what is hiding inside them, in millions of lines program code in C++...
Tali published the results of the study on her website, but we believe they deserve the attention of a wider audience, so we post them here with some abbreviations.

A web developer who is familiar with the inner workings of browsers makes better decisions and understands why one or another should be chosen. This is quite a lengthy document, but we encourage you to read it as carefully as possible and guarantee that you won't regret it. Paul Irish, Chrome Developer Relations

Introduction

Web browsers are perhaps the most common applications. In this tutorial, I explain how they work. We will take a detailed look at what happens from the moment you type google.ru in the address bar until Google Pages on the screen.

What browsers will we consider

Today there are five main browsers: Internet Explorer, Firefox, Safari, Chrome and Opera. The examples use open source browsers: Firefox, Chrome and Safari (the code is partially open). As of August 2011, Firefox, Safari, and Chrome browsers were installed on a total of 60% of devices, according to browser usage statistics at StatCounter. Thus, open source browsers have a very strong position today.

Basic browser features

The main purpose of a browser is to display web resources. To do this, a request is sent to the server, and the result is displayed in the browser window. Resources are mostly HTML documents, but they can also be PDF files, images, or other content. The location of a resource is determined using a URI (Uniform Resource Identifier).

The way the browser processes and displays HTML files is defined by the HTML and CSS specifications. They are developed by the Consortium W3C, which implements standards for the Internet.
For many years, browsers met only part of the specifications, and separate extensions were created for them. For web developers, this meant serious compatibility issues. Most browsers today meet all of the specifications to a greater or lesser extent.

User interfaces different browsers have a lot in common. The main elements of the browser interface are listed below.

  • Address bar for entering a URI
  • Navigation buttons "Back" and "Forward"
  • Bookmarks
  • Refresh and stop page loading buttons
  • Home button to go to the main page

Oddly enough, the specification that would define the standards user interface browser does not exist. Modern interfaces are the result of many years of evolution, as well as the fact that developers partially copy each other. The HTML5 spec doesn't specify exactly what the browser interface should contain, but it lists some of the core elements. These include the address bar, status bar, and toolbar. Of course, there are also specific features, such as the download manager in Firefox.

Top level structure

The main components of the browser () are listed below.

  1. User interface– includes address bar, back and forward buttons, bookmarks menu, etc. It includes all elements except the window in which the requested page is displayed.
  2. Browser engine– controls the interaction between the interface and the display module.
  3. Display module- Responsible for displaying the requested content on the screen. For example, if an HTML document is requested, the renderer parses the HTML and CSS code and displays the result on the screen.
  4. Network components – designed to make network calls such as HTTP requests. Their interface does not depend on the type of platform, each of which has its own implementations.
  5. Executive part of the user interface- used to draw basic widgets such as windows and combo boxes. Its generic interface is also independent of the platform type. The execution part always applies the methods of the user interface of a particular operating system.
  6. JavaScript interpreter– used to parse and execute JavaScript code.
  7. Data store- necessary for the persistence of processes. Browser saves to HDD data various types, for example cookies. The new HTML specification (HTML5) defines the term "web database": it is a complete (albeit lightweight) browser-based database.
Picture . The main components of the browser.

It should be noted that Chrome, unlike most browsers, uses multiple instances of the display module, one in each tab, which are separate processes.

Display module

As the name suggests, the display module is responsible for displaying the requested content on the browser screen.

By default, it is capable of displaying HTML and XML documents, as well as pictures. Special plug-ins (browser extensions) make possible display other content such as PDF files. However, this chapter focuses on the main features: displaying HTML documents and images formatted with CSS styles.

Display modules

The browsers we are interested in (Firefox, Chrome and Safari) use two renderers. Firefox uses Gecko - own development Mozilla, while Safari and Chrome use WebKit.

WebKit is an open source display module that was originally developed for Linux platforms and adapted by Apple for Mac OS and Windows. Details can be found at webkit.org.

Basic scheme of work

The display module receives the content of the requested document by protocol network layer, usually in chunks of 8 KB.

Scheme further work The display module looks like below.

Picture . Scheme of the display module.

The renderer parses the HTML document and translates the tags into nodes in the content tree. Style information is retrieved from both external CSS files and style elements. This information and display instructions in the HTML file are used to create another tree - .

It contains rectangles with visual attributes such as color and size. The rectangles are arranged in the order in which they should be displayed on the screen.

Once the display tree is created, elements begin, during which each node is assigned the coordinates of the point on the screen where it should appear. Then , is executed, in which the nodes of the display tree are sequentially drawn using the frontend of the user interface.

It is important to understand that this is a sequential process. For the convenience of the user, the renderer tries to display the content as soon as possible, so the rendering tree and layout can start before the HTML code is parsed. Some parts of the document are parsed and displayed, while others are only transmitted over the network.

Work examples

Picture . Schematic diagram of the WebKit rendering module. Picture . The scheme of the Mozilla Gecko display module ().

As you can see from Figures 3 and 4, WebKit and Gecko use different terminology, but the way they work is almost identical.

In Gecko, a tree of visually formatted elements is called a frame tree, in which each element is a frame. WebKit uses a render tree consisting of render objects. The placement of elements in WebKit is called layout, or layout (layout), and in Gecko - wrapping (reflow). Union DOM nodes and visual attributes to create a display tree is called an attachment in WebKit. A small difference in Gecko that has nothing to do with semantics is that there is one more layer between the HTML file and the DOM tree. It is called the content sink and is used to form DOM elements. Now let's talk about each stage of work in more detail.

Parsing: general information

Since parsing is milestone operation of the display module, let's consider it in more detail. Let's start with a brief introduction.

Parsing a document means converting it into a readable and executable structure. The result of parsing is typically a tree of nodes representing the structure of the document. It is called a parse tree, or simply a parse tree.

For example, parsing the expression 2 + 3 - 1 might result in the following tree:

Picture . Tree node for mathematical expression.

Grammar

parsing works on the basis certain rules, which are determined by the language (format) of the document. For each format, there are grammatical rules, consisting of vocabulary and syntax. They form the so-called. . Natural languages ​​do not follow the rules of context-free grammar, so standard parsing techniques are not suitable for them.

Syntactic and lexical analyzers

Along with syntactic analysis, lexical analysis is used.

Lexical analysis is the division of information into tokens, or lexemes. Tokens form the vocabulary of a particular language and are the building blocks for creating documents. In a natural language, tokens would be all the words that can be found in dictionaries.

The point of parsing is to apply the syntax rules of the language.

Document parsing is usually performed by two components: lexical analyzer, which parses the input character sequence into real tokens, and parser, parsing the structure of the document according to the syntax rules given language and forming a syntax tree. The parser ignores non-informative characters such as spaces and line breaks.

Picture . Transition from source document to syntax tree.

Parsing is an iterative process. The parser usually asks the lexical for a new token and checks whether it matches one of the syntax rules. If a match can be made, the token is created new node in the syntax tree, and the parser asks for the next token.

If the token doesn't match any rule, the parser defers it and asks for more tokens. This continues until a rule is found that all pending tokens would match. If such a rule cannot be found, the parser throws an exception. This means that the document contains syntax errors and cannot be processed completely.

Translation

The syntax tree is not always the final result. Parsing is often used in the process of translating an input document into desired format. Compilation is an example. Compiler that translates source into a machine code, first parses it and forms a syntax tree, and only then creates a document with machine code based on this tree.

Picture . compilation steps.

Parsing example

Figure 5 shows a syntax tree built on the basis of a mathematical expression. Let's define an elementary mathematical language and consider the parsing process.

Vocabulary: Our language can contain integers, plus and minus signs.

Syntax

  1. The structural elements of the language are expressions, operands and operators.
  2. A language can contain any number of expressions.
  3. An expression is a sequence consisting of an operand, an operator, and another operand.
  4. An operator is a plus or minus token.
  5. An operand is either an integer token or an expression.

Consider the input character sequence 2 + 3 - 1 .
The first element that matches the rule is 2 (according to rule #5, this is the operand). The second such element is 2 + 3 (the sequence consisting of an operand, an operator, and another operand is defined by rule #3). We will find the next correspondence at the very end: the sequence 2 + 3 - 1 is an expression. Since 2+3 is an operand, we get a sequence consisting of an operand, an operator, and another operand, which matches the definition of an expression. Line 2++ does not match the rules, so would be considered invalid.

Formal definition of vocabulary and syntax

The language from the example above could be defined like this:

INTEGER:0|* PLUS: + MINUS: - As you can see, integers are defined by a regular expression.

The syntax is usually described in BNF format. The language from the example above can be described as follows:

Expression:= term operation term operation:= PLUS | MINUS term:= INTEGER | expression

As already mentioned, a language can be processed using standard parsers if its grammar contextless, that is, it can be fully expressed in BNF format. Formal definition context-free grammar can be found in this Wikipedia article.

Types of parsers

There are two types of parsers: top-down and bottom-up. The former perform the analysis from the top down, and the latter from the bottom up. Downstream parsers parse the structure top level and look for matches to syntactic rules. The bottom-up parsers first process the input sequence of characters and gradually reveal the syntactic rules in it, starting with the rules of the lower and ending with the rules of the upper level.

Now let's see how these two types of parsers would handle our example.

The top-down parser would start with the top-level rule and determine that 2 + 3 is an expression. It would then determine that 2 + 3 - 1 is also an expression (in the process of defining expressions, matches to other rules are also found, but the top-level rule is always considered first).

The upstream parser would process the character sequence until it found appropriate rule, which can be used to replace the detected fragment, and so on until the end of the sequence. Expressions with a partial match are placed on the parser stack.

When running such a parser, the input sequence of characters is shifted to the right (think of a cursor placed at the beginning of the sequence and shifted to the right during parsing) and gradually reduced to syntactic rules.

Automatic generation of parsers

Exist special applications to create parsers called generators. It is enough to load the grammar of the language (vocabulary and syntactic rules) into the generator, and it will automatically create the analyzer. Creating a parser requires a deep understanding of how it works, and doing it by hand is not so easy, so generators can be quite useful.

DOM

The resulting syntax tree consists of DOM elements and attribute nodes. DOM- object model document (Document Object Model) - serves to represent an HTML document and interface HTML elements external objects such as JavaScript code.
At the root of the tree is a Document object.

DOM Model almost identical to markup. Consider an example markup:

hello world

The DOM tree for this markup looks like this: Figure . The DOM tree for the markup from the example.

By "the tree contains DOM nodes" is meant that the tree consists of elements that implement one of the DOM interfaces. Browsers use specific implementations that have additional attributes for internal use.

Parsing algorithm

As discussed in the previous sections, HTML code cannot be parsed using standard downstream or downstream parsers.

The reasons for this are listed below.

  1. The language has a "sparing" character.
  2. Browsers have built-in mechanisms to handle some common errors in HTML code.
  3. The parsing cycle is characterized by the possibility of re-entry. The source document usually does not change during parsing, however, in the case of HTML, script tags containing document.write can add new tokens, so the source code can change.

Since standard parsers are not suitable for HTML, browsers create their own parsers.

The parsing algorithm is detailed in the HTML5 specification. It consists of two stages: lexical analysis and tree construction.

During lexical analysis, the input sequence of characters is divided into tokens. HTML tokens include start and end tags, as well as attribute names and values.

The lexical analyzer finds a token, passes it to the tree constructor, and moves on to the next character, looking for more tokens, until the end of the input sequence.

Picture . Steps in parsing HTML code (source: HTML5 specification).

Lexical analysis algorithm

The result of the algorithm is an HTML token. The algorithm is expressed as an automaton with a finite number of states. In each state, one or more characters of the input sequence are processed, on the basis of which the next state is determined. It depends on the stage of lexical analysis and the stage of tree formation, that is, processing the same character can lead to different results (different states) depending on the current state. The algorithm is complex enough to be described in detail here, so let's look at a simplified example that will help us better understand how it works.

Let's lexically analyze a simple HTML code:

hello world

The initial state is "data". When the parser encounters a character< , состояние меняется на "open tag". If a letter (a-z) is next encountered, a start tag token is generated and the state changes to "tag name". It persists until the > character is encountered. Symbols are added one by one to the name of the new token. In our case, the html token is obtained.

When the > symbol is found, the token is considered ready and the analyzer returns to the state "data". Tag processed in exactly the same way. So the parser has already generated the html and body tags and returned to the state "data". Finding the letter H in the phrase Hello world leads to the generation of a character token. The same happens with the rest of the letters, until the analyzer reaches the symbol< в теге . For each character of the Hello world phrase, a token is created.

The analyzer then returns to the state "open tag". Detection of the character / leads to the creation of a closing tag token and a transition to the state "tag name". It persists until the > character is encountered. At this moment, a new tag token is generated, and the analyzer returns to the state "data". Character sequence processed as described above.

Picture . Lexical analysis of the input sequence of characters.

Tree construction algorithm

When a parser is created, a Document object is generated. During the construction phase, the DOM tree at the root of which this object is located is modified and new elements are added to it. Each node generated by the lexer is processed by the tree constructor. Each token has its own DOM element defined by the specification. Elements are added not only to the DOM tree, but also to the stack of open elements, which serves to fix incorrectly nested or unclosed tags. The algorithm is also expressed as an automaton with a finite number of states, which are called "insertion modes".

Consider the tree creation steps for the following code snippet:

hello world

At the beginning of the tree building phase, we have a sequence of tokens obtained as a result of lexical analysis. The first state is called initial. Upon receiving the html token, the state changes to "to html", after which the token is reprocessed in this state. This creates an HTMLHtmlElement element that is added to the root Document object.

The state changes to "up to head". The parser detects the body token. Although there is no head tag in our code, an HTMLHeadElement element will be automatically created and added to the tree.

The state changes to "inside head", then on "after head". The body token is parsed again, an HTMLBodyElement is created and added to the tree, and the state changes to "inside body".

Now it's the turn of Hello world string tokens. Finding the first one leads to the creation and insertion of a Text node, to which the rest of the characters are then added.

Upon receipt of the closing body token, the state changes to "after body". When the parser reaches the closing html tag, the state changes to "after after body". When the end-of-file token is received, parsing ends.

Picture . Building a tree for the HTML code from the example.

Actions after parsing

At this point, the browser marks the document as interactive and starts parsing deferred scripts that must be executed after parsing the document is complete. The document's state is then changed to "done" and the load event is fired.

Let's look at a few examples.
The lexical grammar (dictionary) is defined by regular expressions for each token:

Comment \/\*[^*]*\*+([^/*][^*]*\*+)*\/ num +|*"."+ nonascii [\200-\377] nmstart [_a -z]|(nonascii)|(escape) nmchar [_a-z0-9-]|(nonascii)|(escape) name (nmchar)+ ident (nmstart)(nmchar)*

Ident is an identifier that is used as the class name. Name is the id element and is referenced using the pound sign (#).

The syntax rules are described in BNF format.

Ruleset: selector [ "," S* selector ]* "(" S* declaration [ ";" S* declaration ]* ")" S* ; selector: simple_selector [ combinator selector | S+ [ combinator? selector]? ]? ; simple_selector: element_name [ HASH | class | attrib | pseudo ]* | [ HASH | class | attrib | pseudo ]+ ; class: "." IDENT ; element_name: IDENT | "*"; attrib: "[" S* IDENT S* [ [ "=" | INCLUDES | DASHMATCH ] S* [ IDENT | STRING ] S* ] "]" ; pseudo: ":" [ IDENT | FUNCTION S* ")" ] ; A rule set is the structure described below. div.error , a.error ( color:red; font-weight:bold; ) The div.error and a.error elements are selectors. The valid rules of this set are enclosed in curly braces. Formally, this structure is defined as follows: ruleset: selector [ "," S* selector ]* "(" S* declaration [ ";" S* declaration ]* ")" S* ; This means that the ruleset acts as a selector, or as multiple selectors separated by commas and spaces (the S stands for space). A rule set contains one or more declarations separated by semicolons. They are enclosed in curly braces. Definitions of the concepts "declaration" and "selector" will be given below.

CSS Parser in WebKit

WebKit uses generators to automatically generate CSS parsers. As already mentioned, Bison is used to create bottom-up parsers, during which the input character sequence is shifted to the right. Firefox uses a top-down parser developed by Mozilla. In both cases, the CSS file is parsed into StyleSheet objects containing the CSS rules. The CSS rules object contains the selector and declaration, as well as other objects specific to the CSS grammar.

Picture . Parsing CSS.

Processing order for scripts and style sheets

Scripts

Web documents follow the synchronous model. Scripts are supposed to be parsed and executed as soon as the analyzer detects the tag

Top Related Articles