The main components of the assembly language and the structure of commands. General characteristics of the command system of the Assembler language for IBM-PC (basic set of commands, basic methods of addressing operands)

02.07.2020 OS

Commands can be distinguished according to their purpose (in brackets are examples of mnemonic codes of operations for assembler commands of a PC such as IBM PC):

l performing arithmetic operations (ADD and ADC - addition and addition with transfer, SUB and SBB - subtraction and subtraction with a loan, MUL and IMUL - unsigned and signed multiplication, DIV and IDIV - unsigned and signed divisions, CMP - comparisons etc.);

l performing logical operations (OR, AND, NOT, XOR, TEST, etc.);

l data transfer (MOV - send, XCHG - exchange, IN - enter into the microprocessor, OUT - output from the microprocessor, etc.);

l transfer of control (program branches: JMP - unconditional branch, CALL - procedure call, RET - return from the procedure, J * - conditional branch, LOOP - loop control, etc.);

l processing character strings (MOVS - transfers, CMPS - comparisons, LODS - downloads, SCAS - scans. These commands are usually used with the prefix (repetition modifier) REP;

l program interruptions (INT - software interrupts, INTO - conditional interrupt in case of overflow, IRET - return from interrupt);

l microprocessor control (ST * and CL * - setting and clearing flags, HLT - stopping, WAIT - waiting, NOP - idling, etc.).

A complete list of assembler commands can be found in the works.

Data transfer commands

l MOV dst, src - data transfer (move - transfer from src to dst).

Transfers: one byte (if src and dst are in byte format) or one word (if src and dst are in word format) between registers or between register and memory, and also writes an immediate value to a register or memory.

The dst and src operands must be in the same byte or word format.

Src can be of type: r (register) - register, m (memory) - memory, i (impedance) - immediate value. Dst can be of type r, m. You cannot use the following operands in the same command: rsegm together with i; two operands of type m and two operands of type rsegm). The i operand can also be a simple expression:

mov AX, (152 + 101B) / 15

The expression is evaluated only during translation. Does not change flags.

l PUSH src - pushing a word onto the stack (push - push through; push onto the stack from src). Pushes the contents of src to the top of the stack — any 16-bit register (including segment ones) or two memory locations containing a 16-bit word. Flags do not change;

l POP dst - pop a word from the stack (pop - pop; read from the stack into dst). Pops a word off the top of the stack and places it into dst - any 16-bit register (including segment) or two memory locations. The flags do not change.

Introduction.

The language in which the original program is written is called entrance language, and the language into which it is translated for execution by the processor-rom is weekend language. The process of converting an input language to an output language is called broadcast. Since processors are capable of executing programs in a machine language of binary codes, which is not used for programming, translation of all source programs is necessary. Known two ways translations: compilation and interpretation.

At compilation the original program is first completely translated into an equivalent program in the output language, called object program and then executed. This process is implemented using a special programs, called the compiler. A compiler for which the input language is a symbolic representation of the machine (output) language of binary codes is called assembler.

At interpretations each line of the text of the source program is analyzed (interpreted) and the command specified in it is immediately executed. The implementation of this method is entrusted to interpreter program. Interpretation takes a long time. To increase its efficiency, instead of processing each line, the interpreter first converts all team strings to characters (

). The generated sequence of symbols is used to perform the functions assigned to the original program.

The assembly language discussed below is implemented using compilation.

Features of the language.

The main features of the assembler are:

● instead of binary codes, the language uses symbolic names - mnemonics. For example, for the addition command (

) the mnemonic is used

Subtraction (

multiplication (

Divisions (

and so on. Symbolic names are also used for addressing memory cells. For programming in assembly language, instead of binary codes and addresses, you only need to know symbolic names, which the assembler translates into binary codes;

● every statement matches one machine command(code), that is, there is a one-to-one correspondence between machine instructions and operators in an assembly language program;

● language provides access to all objects and teams. High-level languages do not have this ability. For example, assembly language allows you to check the register bit of flags, and a high-level language (for example,

) does not have this ability. Note that languages for system programming (for example, C) often occupy an intermediate position. In terms of accessibility, they are closer to assembly language, but have the syntax of a high-level language;

● assembly language is not a universal language. Each specific group of microprocessors has its own assembler. High-level languages do not have this disadvantage.

Unlike high-level languages, writing and debugging an assembly language program is time consuming. Despite this, assembly language received wide use due to the following circumstances:

● a program written in assembly language is much smaller and much faster than a program written in a high-level language. For some applications, these indicators play a primary role, for example, many system programs (including compilers), programs in credit cards, cell phones, device drivers, etc .;

● some procedures require full access to the hardware, which is usually not possible in a high-level language. This case includes interrupts and interrupt handlers in operating systems, as well as device controllers in embedded systems that operate in real time.

In most programs, only a small percentage of the total code is responsible for a large percentage of the program's execution time. Typically, 1% of the program is responsible for 50% of the execution time, and 10% of the program is responsible for 90% of the execution time. Therefore, to write a specific program in real conditions, both an assembler and one of the high-level languages are used.

The format of the operator in assembly language.

An assembly language program is a list of commands (statements, sentences), each of which occupies a separate line and contains four fields: a label field, an operation field, an operand field and a comment field. There is a separate column for each field.

Label field.

Column 1 is allocated for the label field. The label is a symbolic name, or identifier, addresses memory. It is necessary in order to be able to:

● make a conditional or unconditional jump to the command;

● get access to the place where the data is stored.

Such statements are labeled. To designate a name, (uppercase) letters of the English alphabet and numbers are used. The name must be preceded by a letter and a colon separator at the end. A label with a colon can be written on a separate line, and an opcode can be written on the next line in column 2, which simplifies the compiler's work. The absence of a colon does not allow distinguishing a label from an opcode if they are located on separate lines.

In some versions of assembly language, colons are placed only after command labels, but not after data labels, and the length of the label can be limited to 6 or 8 characters.

There should not be identical names in the label field, since the label is associated with command addresses. If during program execution there is no need to call a command or data from memory, the label field remains empty.

Operation code field.

This field contains the mnemonic code of the command or pseudo-command (see below). The mnemonic code of the commands is chosen by the language designers. In assembly language

mnemonic selected to load register from memory

), and to save the contents of the register in memory - the mnemonic

). In assembly languages

one name can be used for both operations, respectively

If the choice of mnemonic names can be arbitrary, then the need to use two machine instructions is due to the architecture of the processors

The register mnemonic also depends on the assembler version (Table 5.2.1).

Operand field.

Additional information required to complete the operation is located here. In the field of operands for jump instructions, the address where you want to jump is indicated, as well as addresses and registers, which are operands for a machine instruction. As an example, we will give operands that can be used for 8-bit processors.

● numerical data,

presented in various number systems. To designate the number system used, the constant is followed by one of the Latin letters: B,

Accordingly, binary, octal, hexadecimal, decimal number systems (

you don't have to write it down). If the first digit of a hexadecimal number is A, B, C,

Then an insignificant 0 (zero) is added in front;

● codes of internal registers of the microprocessor and memory cells

M (sources or receivers of information) in the form of letters A, B, C,

M or their addresses in any number system (for example, 10B - register address

in the binary system);

● identifiers,

for register pairs of aircraft,

The first letters B,

H; for a pair of accumulators and a register of signs -

; for the command counter -

; for the stack pointer -

● labels indicating the addresses of the operands or the next instructions in the conditional

(if the condition is met) and unconditional jumps. For example, the M1 operand in the command

means the need for an unconditional transition to the command, the address of which in the label field is marked with the M1 identifier;

● expressions,

which are built by linking the data discussed above using arithmetic and logical operators. Note that the way in which data space is reserved depends on the language version. Assembly language developers for

Define a word), and later introduced an alternative.

which was in the language for processors from the beginning

In language version

used by

Define a constant).

Processors process operands of different lengths. To define it, the assembler developers made different decisions, for example:

II registers of different lengths have different names: ЕАХ - for placing 32-bit operands (type

); АХ - for 16-bit (type

and AH - for 8-bit (type

● for processors

suffixes are added to each opcode: suffix

For type

; suffix ".B" for type

for operands of different lengths, different opcodes are used, for example, to load a byte, half-word (

) and words in the 64-bit register, opcodes are used

respectively.

Comments field.

This field provides explanations about the actions of the program. Comments do not affect the operation of the program and are intended for humans. They may be needed to modify the program, which without such comments may be completely incomprehensible even to experienced programmers. A comment begins with a symbol and is used to explain and document programs. The starting character of a comment can be:

● semicolon (;) in languages for company processors

● exclamation mark (!) In languages for

Each individual comment line is preceded by a start character.

Pseudo-commands (directives).

In assembly language, there are two main types of commands:

● basic instructions that are the equivalent of the machine code of the processor. These commands perform all the processing provided by the program;

● pseudo commands, or directives, designed to service the process of translating a program into the language of code combinations. As an example, in table. 5.2.2 some pseudo-commands from the ac-sampler are given

for the family

When programming, there are situations when, according to the algorithm, the same chain of commands must be repeated many times. To get out of this situation, you can:

● write the desired sequence of commands whenever it is encountered. This approach increases the size of the program;

● form this sequence into a procedure (subroutine) and call it if necessary. This output has its drawbacks: each time you have to execute a special procedure call command and a return command, which, with a short and frequently used sequence, can greatly reduce the speed of the program.

The simplest and most efficient way to repeat a chain of commands multiple times is to use macro, which can be thought of as a pseudo-command designed to re-broadcast a group of commands frequently encountered in a program.

A macro, or macro, is characterized by three aspects: macro-definition, macro-reversal, and macro-expansion.

Macro definition

This is a designation of a repeatedly repeated sequence of program commands, used for links in the text of the program.

The macro definition has the following structure:

List of expressions; Macro definition

Three parts can be distinguished in the above structure of a macro-definition:

● title

a macro that includes a name

Pseudo-command

and a set of parameters;

● marked with dots body macro;

● team

endings

macros.

The set of macro definition parameters contains a list of all parameters listed in the operand field for the selected group of commands. If these parameters are given in the program earlier, then they can be omitted in the header of the macro definition.

To reassemble the selected group of commands, use the address consisting of the name

macros and a list of parameters with other values.

When the assembler encounters a macro definition during compilation, it stores it in the macro definition table. On subsequent appearances in the program of the name (

) of the macro, the assembler replaces it with the body of the macro.

Using a macro name as an opcode is called macro circulation(by a macro call), and its replacement by the body of the macro is macro expansion.

If the program is presented as a sequence of characters (letters, numbers, spaces, punctuation marks and carriage returns for a new line), then macro expansion consists in replacing some strings from this sequence with other strings.

Macro expansion occurs during the assembly process, not during program execution. The way to manipulate character strings is the responsibility of macro funds.

The assembly process is carried out in two passes:

● on the first pass, all macro definitions are saved, and macro calls are expanded. In this case, the original program is read and converted into a program in which all macro definitions are removed, and each macro call is replaced by the body of the macro;

● in the second pass, the resulting program is processed without macros.

Parameterized macros.

To work with repetitive sequences of commands, the parameters of which can take on different values, the following macro definitions are provided:

● with actual parameters that are placed in the field of the operands of the macro call;

● with formal parameters. In the process of expanding the macro, each formal parameter that appears in the body of the macro is replaced by the corresponding actual parameter.

using macros with parameters.

Program 1 shows two similar sequences of commands, differing in that the first of them swaps P and

And the second

Program 2 includes a macro with two formal parameters P1 and P2. During macro expansion, each P1 character inside the macro body is replaced by the first actual parameter (P,

) and P2 is replaced by the second actual parameter (

) from program No. 1. In the macrozone

program 2 is marked: P,

The first actual parameter,

Second actual parameter.

Program 1

Program 2

MOV EBX, Q MOV EAX, Pl

MOV Q, EAX MOV EBX, P2

MOV P, EBX MOV P2, EAX

Extended capabilities.

Let's consider some advanced features of the language

If the macro containing the conditional branch command and the label to which the branch is being made is called two or more times, then the label will be duplicated (the problem of duplicate labels), which will cause an error. Therefore, with each call, a separate label is assigned (by the programmer) as a parameter. In language

the label is declared local (

) and thanks to its advanced features, the assembler automatically generates a different label each time a macro is expanded.

allows you to define macros within other macros. This advanced feature is very useful when combined with conditional linking. Consider

IF WORDSIZE GT 16 M2 MACRO

Macro M2 can be defined in both parts of the statement

However, the definition depends on what processor the program is assembled on: 16-bit or 32-bit. If M1 is not called, then the M2 macro will not be defined at all.

Another advanced feature is that macros can call other macros, including themselves - recursive call. In the latter case, in order not to get an infinite loop, the macro must pass a parameter to itself, which changes with each expansion, and also check this parameter and end the recursion when the parameter reaches a certain value.

On the use of macros in assembler.

When using macros, the assembler must be able to perform two functions: preserve macro-definitions and extend macro calls.

Preservation of macros.

All macro names are stored in a table. Each name is accompanied by a pointer to the corresponding macro so that it can be called if necessary. Some assemblers have a separate table for macro names, while others have a general table in which, along with the names of macros, all machine instructions and directives are located.

When encountering a macro during assembly is created:

● new table element with the name of the macro, the number of parameters and a pointer to another table of macro definitions, where the body of the macro will be stored;

● list formal parameters.

Then the body of the macro is read and saved in the table of macro definitions, which is just a string of symbols. Formal parameters found in the body of the loop are marked with a special symbol.

Internal representation of a macro

from the above example for program 2 (p. 244) looks like:

MOV EAX, MOV EBX, MOV MOV &

where a semicolon is used as a carriage return character, and the ampersant & is used as a formal parameter character.

Expansion of macro calls.

Whenever a macro is encountered during assembly, it is stored in the macro table. When the macro is called, the assembler temporarily pauses reading input from the input device and starts reading the saved macro body. The formal parameters extracted from the body of the macro are replaced with the actual parameters and provided by the call. The ampersant & in front of the parameters allows the as-sampler to recognize them.

Despite the fact that there are many versions of assembler, the assembly processes have common features and are similar in many ways. The work of the two-pass assembler is discussed below.

Two-pass assembler.

The program consists of a number of operators. Therefore, it would seem that when assembling, you can use the following sequence of actions:

● translate it into machine language;

● transfer the received machine code to a file, and the corresponding part of the listing - to another file;

● repeat the above procedures until the entire program has been translated.

However, this approach is not effective. An example is the so-called problem lookahead links. If the first statement is a jump to the P statement located at the very end of the program, then the assembler cannot translate it. He must first determine the address of the operator P, and for this it is necessary to read the entire program. Each complete reading of the original program is called aisle. Let's show you how you can solve the forward link problem using two passes:

● the first pass follows to collect and save all definitions of symbols (including labels) in the table, and on the second pass - read and assemble each operator. This method is relatively simple, but the second pass through the original program requires additional time spent on I / O operations;

● on the first pass, transform program into an intermediate form and save it in the table, and execute the second pass not according to the original program, but according to the table. This method of assembly saves time, since no I / O is performed on the second pass.

First pass.

First pass target- build a symbol table. As noted above, another goal of the first pass is to preserve all macros and expand calls as they appear. Consequently, both symbol definition and macro expansion take place in one pass. The symbol can be either label, or meaning, which is assigned a specific name using the directive:

; Value - buffer size

By assigning values to symbolic names in the instruction label field, the assembler essentially sets the addresses that each instruction will have during program execution. For this, the assembler during the assembly process saves command address counter(

) as a special variable. At the beginning of the first pass, the value of the special variable is set to 0 and increases after each processed command by the length of this command. As an example, in table. 5.2.3 shows a fragment of the program indicating the length of the commands and the values of the counter. On the first pass, tables are formed symbolic names, directives and opcodes, and if necessary literal table. A literal is a constant for which the assembler will automatically reserve memory. Immediately, we note that modern processors contain instructions with immediate addresses, so their acsemblers do not support literals.

Symbolic name table

contains one element for each name (table 5.2.4). Each element of the symbolic name table contains the name itself (or a pointer to it), its numerical value and sometimes some additional information, which may include:

● the length of the data field associated with the symbol;

● bits of memory reallocation (which show whether the value of a symbol changes if the program is loaded in a different address than the assembler intended);

● information about whether the symbol can be accessed from outside the procedure.

Symbolic names are labels. They can be specified using operators (for example,

Directive table.

This table lists all the directives, or pseudo-commands, that are encountered when assembling a program.

Operation code table.

For each opcode, the table provides separate columns: designation of the opcode, operand 1, operand 2, hexadecimal value of the opcode, command length and command type (Table 5.2.5). Operation codes are divided into groups depending on the number and type of operands. The command type determines the group number and specifies the procedure that is called to process all commands in this group.

Second pass.

Second pass target- creation of an object program and printout, if necessary, of the assembly protocol; outputting information necessary for the linker to link procedures that were assembled at different times into one executable file.

In the second pass (as in the first), the lines containing the statements are read and processed one after the other. Original operator and derived from it in hexadecimal system output object the code can be printed or buffered for later printing. After resetting the counter of the command address, the next statement is called.

The original program may contain errors, for example:

● the given symbol is undefined or defined more than once;

● the opcode is represented by an invalid name (due to a typo), is not supplied with a sufficient number of operands, or has too many operands;

● there is no operator

Some assemblers can pick up an undefined character and replace it. However, in most cases, when it detects a statement with an error, the assembler displays an error message on the screen and tries to continue the assembly process.

Articles dedicated to the assembly language.

Topic 1.4 Assembler Mnemonics. Structure and formats of commands. Types of addressing. Microprocessor command set

Plan:

1 Assembly language. Basic concepts

2 Assembly language symbols

3 Types of assembler statements

4 Assembly directives

5 Processor instruction set

1 iassembly language. Basic concepts

Assembly languageis a symbolic representation of machine language. All processes in a machine at the lowest, hardware level are only driven by machine language commands (instructions). Hence, it is clear that, despite the general name, the assembly language is different for each type of computer.

An assembly language program is a collection of blocks of memory called memory segments. A program can consist of one or several such block-segments. Each segment contains a set of language sentences, each of which occupies a separate line of program code.

Assembly sentences are of four types:

1) commands or instructions, which are symbolic analogs of machine instructions. In the process of translation, the assembler instructions are converted into the corresponding commands of the microprocessor instruction set;

2) macros -sentences of the text of the program, formed in a certain way, are replaced during the broadcast by other sentences;

3) directives,which are instructions to the assembler translator to perform some actions. Directives have no analog in machine representation;

4) comment lines containing any symbols, including the letters of the Russian alphabet. Comments are ignored by the translator.

The structure of an assembly program. Assembler syntax.

The sentences that make up the program can be a syntactic construct corresponding to a command, macro, directive, or comment. In order for the assembler translator to recognize them, they must be formed according to certain syntactic rules. The best way to do this is to use a formal description of the syntax of the language like grammar rules. The most common ways to describe a programming language like this - syntax diagrams and extended Backus-Naur forms. More convenient for practical use syntax diagrams. For example, the syntax of assembler sentences can be described using the syntax diagrams shown in the following figures 10, 11, 12.

Figure 10 - Assembly sentence format

Figure 11 - Format of directives

Figure 12 - Format of commands and macros

In these figures:

label name- an identifier, the value of which is the address of the first byte of that sentence of the source text of the program, which it designates;

name -identifier that distinguishes this directive from other directives of the same name. As a result of the processing of a specific directive by the assembler, this name can be assigned certain characteristics;

opcode (COP) and directive - they are mnemonic designations of the corresponding machine instruction, macro command, or translator directive;

operands -parts of a command, macro, or assembler directive that designate the objects to be manipulated. Assembler operands are described by expressions with numeric and text constants, labels and variable identifiers using operation signs and some reserved words.

Syntax diagrams help find and then go from the input of the diagram (left) to its output (right). If such a path exists, then the sentence or construction is syntactically correct. If there is no such path, then the compiler will not accept this construction.

2 Assembly language symbols

Allowed characters when writing program text are:

1) all latin letters: A-Z,a-z... In this case, uppercase and lowercase letters are considered equivalent;

2) numbers from 0 before 9 ;

3) signs ? , @ , $ , _ , & ;

4) separators , . () < > { } + / * % ! " " ? = # ^ .

Assembler sentences are formed from tokens, which are syntactically inseparable sequences of valid characters of the language that are meaningful for the translator.

Lexemes are:

1) identifiers - sequences of valid characters used to denote program objects such as opcodes, variable names, and label names. The rule for writing identifiers is as follows: an identifier can consist of one or more characters;

2) character strings - sequences of characters enclosed in single or double quotes;

3) whole numbers of one of the following number systems : binary, decimal, hexadecimal. The identification of numbers when writing them in assembly language programs is carried out according to certain rules:

4) decimal numbers do not require any additional symbols for their identification, for example 25 or 139. For identification in the source code of the program binary numbers it is necessary after recording the zeros and ones that make up them, put the Latin “ b”, For example 10010101 b.

5) hexadecimal numbers have more conventions when they are written:

First, they consist of numbers 0...9 , lowercase and uppercase letters of the Latin alphabet a,b, c,d,e,f or A,B,C,D,E,F.

Secondly, the translator may have difficulties in recognizing hexadecimal numbers due to the fact that they can consist of either only digits 0 ... 9 (for example, 190845), or start with a letter of the Latin alphabet (for example, ef15). In order to "explain" to the translator that a given token is not a decimal number or an identifier, the programmer must select a hexadecimal number in a special way. To do this, at the end of the sequence of hexadecimal digits that make up a hexadecimal number, write the Latin letter “ h”. This is a prerequisite. If a hexadecimal number starts with a letter, then a leading zero is written in front of it: 0 ef15 h.

Almost every sentence contains a description of the object on which or with the help of which some action is performed. These objects are called operands... They can be defined like this: operands- these are objects (some values, registers or memory cells) that are acted upon by instructions or directives, or they are objects that define or clarify the action of instructions or directives.

It is possible to carry out the following classification of the operands:

constant or immediate operands;

address operands;

movable operands;

address counter;

base and index operands;

structural operands;

records.

Operands are elementary components from which a part of a machine instruction is formed, denoting the objects on which an operation is performed. In a more general case, operands can be included as constituent parts in more complex formations, called expressions.

Expressions are combinations of operands and operators, treated as a whole. The result of evaluating an expression can be an address of some memory cell or some constant (absolute) value.

3 Types of assembler statements

Let's list the possible types assembler operators and syntactic rules for the formation of assembler expressions:

arithmetic operators;

shift operators;

comparison operators;

logical operators;

index operator;

type override operator;

segment redefinition operator;

structure type naming operator;

operator for obtaining the segment component of the expression address;

operator to get the offset of an expression.

1 Assembler directives

Assembler directives are:

1) Segmentation directives. In the course of the previous discussion, we found out all the basic rules for writing commands and operands in an assembler program. The question of how to correctly formulate the sequence of commands so that the translator can process them and the microprocessor can execute them remains open.

When considering the architecture of the microprocessor, we learned that it has six segment registers, through which it can work simultaneously:

with one code segment;

with one stack segment;

with one data segment;

with three additional data segments.

Physically, a segment is a memory area occupied by instructions and (or) data, the addresses of which are calculated relative to the value in the corresponding segment register. The syntactic description of a segment in assembly is the construction shown in Figure 13:

Figure 13 - Syntactic description of a segment in assembler

It is important to note that the functionality of a segment is somewhat broader than simply splitting a program into blocks of code, data, and stack. Segmentation is part of a more general mechanism related to the concept of modular programming. It assumes the unification of the design of object modules created by the compiler, including those from different programming languages. This allows you to combine programs written in different languages. The operands in the SEGMENT directive are intended to implement various variants of such an association.

2) Listing management directives. Listing control directives are divided into the following groups:

general listing control directives;

directives for outputting to the listing of included files;

directives for outputting conditional assembly blocks;

directives for listing macros;

directives for outputting information about cross-references to the listing;

directives for changing the listing format.

2 Processor instruction set

The processor instruction set is shown in Figure 14.

Let's consider the main groups of teams.

Figure 14 - Classification of assembler instructions

The teams are:

1 Data transfer commands. These commands occupy a very important place in the instruction set of any processor. They perform the following essential functions:

storing the contents of the internal registers of the processor in memory;

copying content from one area of memory to another;

writing to I / O devices and reading from I / O devices.

On some processors, all these functions are performed by one single command. MOV (for byte transfers - MOVB ) but with different methods of addressing the operands.

On other processors besides the command MOV there are several more commands to perform the listed functions. Also, data transfer commands include information exchange commands (their designation is based on the word Exchange ). The exchange of information between internal registers, between two halves of one register ( SWAP ) or between a register and a memory location.

2 Arithmetic commands. Arithmetic instructions treat operand codes as numeric binary or binary-decimal codes. These commands can be divided into five main groups:

fixed point operations commands (addition, subtraction, multiplication, division);

floating point commands (addition, subtraction, multiplication, division);

cleaning commands;

increment and decrement commands;

comparison command.

3 Commands of operations with a fixed point work with codes in the registers of the processor or in memory as with usual binary codes. Floating point (point) instructions use the format of representing numbers with order and mantissa (usually these numbers occupy two consecutive memory locations). In modern powerful processors, the set of floating point instructions is not limited to only four arithmetic operations, but contains many other more complex instructions, for example, the calculation of trigonometric functions, logarithmic functions, as well as complex functions necessary for processing sound and images.

4 Clearing instructions are designed to write a zero code to a register or memory cell. These commands can be overridden with zero transfer commands, but special clear commands are usually faster than transfer commands.

5 Commands for increment (increase by one) and decrement

(decrements by one) are also very convenient. They can in principle be replaced by the addition with one or subtract one commands, but increment and decrement are faster than addition and subtraction. These instructions require one input operand, which is also an output operand.

6 Comparison instruction compares two input operands. In fact, it calculates the difference between these two operands, but does not form an output operand, but only changes the bits in the processor status register based on the result of this subtraction. The next instruction following the compare instruction (usually a branch instruction) will analyze the bits in the processor status register and perform actions depending on their values. Some processors provide instructions for daisy chaining two sequences of operands in memory.

7 Logical commands. Logical instructions perform logical (bitwise) operations on operands, that is, they consider the operand codes not as a single number, but as a set of separate bits. This is how they differ from arithmetic commands. Logic commands perform the following basic operations:

logical AND, logical OR, addition modulo 2 (Exclusive OR);

logical, arithmetic and cyclic shifts;

checking bits and operands;

setting and clearing bits (flags) of the processor status register ( PSW).

Logic instructions allow you to compute basic logic functions from two input operands, bit by bit. In addition, the AND operation is used to forcibly clear the specified bits (a mask code is used as one of the operands, in which the bits to be cleared are set to zero). The OR operation is used to forcibly set the specified bits (as one of the operands, the mask code is used, in which the bits that require setting to one are equal to one). The "Exclusive OR" operation is used to invert the specified bits (a mask code is used as one of the operands, in which the bits to be inverted are set to one). The instructions require two input operands and form one output operand.

8 Shift commands allow you to bitwise shift the operand code to the right (towards the lower-order bits) or to the left (towards the higher-order bits). The type of shift (logical, arithmetic, or cyclic) determines what the new value of the most significant bit (when shifting to the right) or least significant bit (when shifting to the left) will be, and also determines whether the previous value of the most significant bit (when shifting to the left) will be saved somewhere or the least significant bit (when shifted to the right). Cyclic shifts allow the bits of an operand to be shifted in a circular fashion (clockwise when shifted to the right, or counterclockwise when shifted to the left). In this case, the carry flag may or may not be included in the shift ring. The carry flag bit (if used) stores the value of the most significant bit when cycled to the left and the least significant bit when cycled to the right. Accordingly, the value of the carry flag bit will be overwritten in the least significant bit when cycling to the left and in the most significant bit when cycling to the right.

9 Commands of transitions. Jump instructions are designed to organize all kinds of loops, branches, subroutine calls, etc., that is, they disrupt the sequential flow of the program. These commands write a new value to the command counter register and thus cause the processor to jump not to the next command in order, but to any other command in the program memory. Some transition commands provide further return back to the point from which the transition was made, others do not provide for this. If return is provided, then the current processor parameters are saved on the stack. If no return is provided, then the current processor parameters are not saved.

Transition commands without backtracking are divided into two groups:

unconditional jump commands;

conditional jump commands.

These commands use the words Branch and Jump.

Unconditional jump instructions cause a jump to a new address regardless of anything. They can cause a jump by a specified amount of offset (forward or backward) or to a specified memory address. The offset value or the new address value is specified as an input operand.

Conditional jump instructions do not always cause a jump, but only when the specified conditions are met. These conditions are usually the values of flags in the processor status register ( PSW ). That is, the transition condition is the result of the previous operation that changes the values of the flags. There can be from 4 to 16 such jump conditions in total. Several examples of conditional jump commands:

transition if equal to zero;

transition if not equal to zero;

jump if there is overflow;

jump if there is no overflow;

transition if greater than zero;

jump if less than or equal to zero.

If the transition condition is met, then the new value is loaded into the command register. If the jump condition is not met, the instruction counter is simply incremented, and the processor selects and executes the next instruction in order.

The comparison command (CMP) preceding the conditional branch command (or even several conditional branch commands) is used specifically to check the jump conditions. But flags can be set by any other command, for example, a data transfer command, any arithmetic or logical command. Note that the jump commands themselves do not change the flags, which just allows you to put several jump commands one after the other.

Interrupt commands occupy a special place among the jump-back commands. These instructions require an interrupt number (vector address) as an input operand.

Conclusion:

Assembly language is a symbolic representation of a machine language. Assembly language for each type of computer is different. An assembly language program is a collection of blocks of memory called memory segments. Each segment contains a set of language sentences, each of which occupies a separate line of program code. Assembly sentences are of four types: commands or instructions, macros, directives, comment lines.

All latin letters are valid characters when writing program text: A-Z,a-z... In this case, uppercase and lowercase letters are considered equivalent; figures from 0 before 9 ; signs ? , @ , $ , _ , & ; separators , . () < > { } + / * % ! " " ? = # ^ .

The following types of assembler statements and syntax rules for the formation of assembler expressions apply. arithmetic operators, shift operators, comparison operators, logical operators, index operator, type override operator, segment override operator, structure type naming operator, operator for obtaining the segment component of an expression address, operator for obtaining the offset of an expression.

The command system is divided into 8 main groups.

Control questions:

1 What is assembly language?

2 What symbols can be used to write commands in assembly language?

3 What are labels and their purpose?

4 Explain the structure of assembler commands.

5 List 4 types of assembler sentences.

Instruction structure in assembly language Programming at the machine instruction level is the minimum level at which computer programming is possible. The machine instruction set must be sufficient to carry out the required actions by issuing instructions to the machine's hardware. Each machine instruction consists of two parts: an operating one that defines "what to do" and an operand that defines processing objects, that is, what to do over. A microprocessor machine instruction written in assembly language is one line of the following form: label instruction / directive operand (s); comments The label, command / directive and operand are separated by at least one space or tab character. Command operands are separated by commas.

Assembly language instruction structure An assembly language instruction tells the translator what action the microprocessor should take. Assembler directives are parameters specified in the program text that affect the assembly process or the properties of the output file. The operand defines the initial value of the data (in the data segment) or the elements to be acted upon by the command (in the code segment). An instruction can have one or two operands, or no operands. The number of operands is implicitly specified by the command code. If a command or directive needs to be continued on the next line, then the backslash character is used: "". By default, the Assembler does not distinguish between uppercase and lowercase letters in the writing of commands and directives. Examples of directive and command Count db 1; Name, directive, one operand mov eax, 0; Command, two operands

Identifiers are sequences of valid characters used to denote variable names and label names. The identifier can consist of one or more of the following characters: all letters of the Latin alphabet; numbers from 0 to 9; special characters: _, @, $,? ... A dot can be used as the first character of the label. Reserved assembler names (directives, operators, command names) cannot be used as identifiers. The first character of the identifier must be a letter or special character. The maximum length of the identifier is 255 characters, but the translator accepts the first 32 characters, ignores the rest. All labels that are written in a line that does not contain an assembler directive must end with a colon ":". Label, command (directive) and operand do not have to start at any particular position in the string. It is recommended to write them down in a column for better readability of the program.

Labels All labels that are written in a line that does not contain an assembler directive must end with a colon ":". Label, command (directive) and operand do not have to start at any particular position in the string. It is recommended to write them down in a column for better readability of the program.

Comments Using comments in a program improves clarity, especially where the intent of the instruction set is not clear. Comments begin on any line in the source module with a semicolon (;). All characters to the right of “; "To the end of the line are a comment. The comment can contain any printable characters, including space. A comment can span the entire line or follow a command on the same line.

The structure of an assembly language program A program written in assembly language can consist of several parts, called modules, in each of which one or more data, stack, and code segments can be defined. Any complete assembly language program must include one main, or main, module, from which its execution begins. A module can contain program segments, data segments, and a stack, declared using the appropriate directives.

Memory Models Before declaring segments, you need to specify the memory model using a directive. MODEL modifier memory_model, call_convention, OS_type, stack_parameter Basic memory models of assembly language: Memory model Code addressing Data addressing Operating system Code and data interleaving TINY NEAR MS-DOS Allowed SMALL NEAR MS-DOS, Windows No MEDIUM FAR NEAR MS-DOS, Windows No COMPACT NEAR FAR MS-DOS, Windows No LARGE FAR MS-DOS, Windows No HUGE FAR MS-DOS, Windows No NEAR Windows 2000, Windows XP, Windows Allowed FLAT NEAR NT,

Memory Models The tiny model only works in 16-bit MS-DOS applications. In this model, all data and code are located in one physical segment. In this case, the size of the program file does not exceed 64 KB. The small model supports one code segment and one data segment. Data and code are addressed as near when using this model. The medium model supports multiple code segments and one data segment, with all links in code segments considered far (far) by default, and links in a data segment as near (near). The compact model supports multiple data segments using far data addressing and one near data segment. The large model supports multiple code segments and multiple data segments. By default, all code and data references are considered far. The huge model is almost equivalent to the large memory model.

Memory Models The flat model assumes a non-segmented program configuration and is used only on 32-bit operating systems. This model is similar to the tiny model in that the data and code are contained in a single 32-bit segment. To develop a program for the flat model before the directive. model flat one of the directives should be placed:. 386,. 486,. 586 or. 686. The choice of the processor selection directive determines the set of instructions available when writing programs. The letter p after the processor selection directive denotes a protected mode of operation. Data and code addressing is near, with all addresses and pointers being 32-bit.

Memory models. MODEL modifier memory_model, call_convention, OS_type, stack_parameter The modifier parameter is used to define the types of segments and can take the following values: use 16 (segments of the selected model are used as 16-bit) use 32 (segments of the selected model are used as 32-bit). The call_convention parameter is used to determine how parameters are passed when calling a procedure from other languages, including high-level languages (C ++, Pascal). The parameter can take the following values: C, BASIC, FORTRAN, PASCAL, SYSCALL, STDCALL.

Memory models. MODEL modifier memory_model, call_convention, os_type, stack_parameter OS_type is the default OS_DOS, and is currently the only supported value for this parameter. Stack_parameter is set to: NEARSTACK (SS register is DS, data and stack areas are located in the same physical segment) FARSTACK (SS register is not equal to DS, data and stack areas are located in different physical segments). The default is NEARSTACK.

An example of a "doing nothing" program. 686 P. MODEL FLAT, STDCALL. DATA. CODE START: RET END START RET - microprocessor command. It ensures the correct termination of the program. The rest of the program is related to the work of the translator. ... 686 P - Pentium 6 (Pentium II) protected mode commands allowed. This directive selects the supported assembler instruction set by specifying the processor model. ... MODEL FLAT, stdcall is a flat memory model. This memory model is used in the Windows operating system. stdcall is the procedure calling convention used.

An example of a "doing nothing" program. 686 P. MODEL FLAT, STDCALL. DATA. CODE START: RET END START. DATA is a program segment containing data. This program does not use the stack, so the. STACK is missing. ... CODE - a segment of the program containing the code. START is a label. END START - the end of the program and a message to the compiler that the program should be started from the START label. Each program must contain an END directive to mark the end of the program's source code. All lines following the END directive are ignored. The label specified after the END directive tells the translator the name of the main module from which the program starts. If the program contains one module, the label after the END directive can be omitted.

Assembly language translators A translator is a program or technical means that converts a program in one of the programming languages into a program in the target language, called object code. In addition to supporting mnemonics of machine instructions, each translator has its own set of directives and macro tools, which are often incompatible with anything. The main types of assembly language translators: MASM (Microsoft Assembler), TASM (Borland Turbo Assembler), FASM (Flat Assembler) - a free multi-pass assembler written by Tomasz Grishtar (Polish), NASM (Netwide Assembler) - a free assembler for Intel x architecture 86, was created by Simon Tatham in collaboration with Julian Hall and is currently being developed by a small development team on Source. Forge. net.

Src = "https://present5.com/presentation/-29367016_63610977/image-15.jpg" alt = "(! LANG: Broadcasting the program to Microsoft Visual Studio 2005 1) Create a project by choosing File-> New-> Project and"> Трансляция программы в Microsoft Visual Studio 2005 1) Создать проект, выбрав меню File->New->Project и указав имя проекта (hello. prj) и тип проекта: Win 32 Project. В дополнительных опциях мастера проекта указать “Empty Project”.!}

Src = "https://present5.com/presentation/-29367016_63610977/image-16.jpg" alt = "(! LANG: Broadcasting the program to Microsoft Visual Studio 2005 2) In the project tree (View-> Solution Explorer) add"> Трансляция программы в Microsoft Visual Studio 2005 2) В дереве проекта (View->Solution Explorer) добавить файл, в котором будет содержаться текст программы: Source. Files->Add->New. Item.!}

Program translation into Microsoft Visual Studio 2005 3) Select the Code C ++ file type, but specify the name with the extension. asm:

Program translation into Microsoft Visual Studio 2005 5) Set compiler options. Right-click the Custom Build Rules ... menu in the project file.

Translation of the program into Microsoft Visual Studio 2005 and select Microsoft Macro Assembler in the window that appears.

Program translation in Microsoft Visual Studio 2005 Check by right-clicking in the hello. asm of the project tree of the Properties menu and set General-> Tool: Microsoft Macro Assembler.

Src = "https://present5.com/presentation/-29367016_63610977/image-22.jpg" alt = "(! LANG: Broadcasting the program to Microsoft Visual Studio 2005 6) Compile the file by choosing Build-> Build hello. Prj."> Трансляция программы в Microsoft Visual Studio 2005 6) Откомпилировать файл, выбрав Build->Build hello. prj. 7) Запустить программу, нажав F 5 или выбрав меню Debug->Start Debugging.!}

Programming in Windows OS Programming in Windows OS is based on the use of API (Application Program Interface) functions. Their number reaches 2000. The program for Windows consists largely of such calls. All interaction with external devices and operating system resources occurs, as a rule, through such functions. The Windows operating system uses a flat memory model. The address of any memory location will be determined by the contents of one 32-bit register. There are 3 types of program structures for Windows: dialog (main window - dialog), console or windowless structure, classical structure (window, wireframe).

Calling Windows API functions In the help file, any API function is represented as type function_name (FA 1, FA 2, FA 3) Type - the type of the return value; ФАх - a list of formal arguments in the order they appear. For example, int Message. Box (HWND h. Wnd, LPCTSTR lp. Text, LPCTSTR lp. Caption, UINT u. Type); This function displays a window with a message and an exit button (or buttons). Meaning of parameters: h. Wnd - handle to the window in which the message window will appear, lp. Text - the text that will appear in the window, lp. Caption - the text in the caption of the window, u. Type - the type of the window, in particular, you can define the number of exit buttons.

Calling Windows API functions int Message. Box (HWND h. Wnd, LPCTSTR lp. Text, LPCTSTR lp. Caption, UINT u. Type); Almost all parameters of API functions are actually 32-bit integers: HWND is a 32-bit integer, LPCTSTR is a 32-bit pointer to a string, UINT is a 32-bit integer. The suffix "A" is often appended to the function name to move to newer versions of the function.

Calling Windows API functions int Message. Box (HWND h. Wnd, LPCTSTR lp. Text, LPCTSTR lp. Caption, UINT u. Type); When using MASM, you need to add @N N at the end of the name - the number of bytes that the passed arguments occupy on the stack. For Win 32 API functions, this number can be defined as the number of arguments n times 4 (bytes in each argument): N = 4 * n. The CALL assembler command is used to call the function. In this case, all the arguments of the function are passed to it via the stack (PUSH command). Direction of passing arguments: LEFT TO RIGHT - BOTTOM UP. The first argument to be pushed onto the stack is u. Type. The call to the specified function will look like this: CALL Message. Box. [email protected]

Calling Windows API functions int Message. Box (HWND h. Wnd, LPCTSTR lp. Text, LPCTSTR lp. Caption, UINT u. Type); The result of executing any API function is, as a rule, an integer that is returned in the EAX register. The OFFSET directive is a "segment offset", or, in high-level terms, a "pointer" to the beginning of a line. The EQU directive, like #define in the C language, defines a constant. The EXTERN directive tells the translator that the function or identifier is external to the given module.

An example of the program "Hello everyone!" ... 686 P. MODEL FLAT, STDCALL. STACK 4096. DATA MB_OK EQU 0 STR 1 DB "My first program", 0 STR 2 DB "Hello everyone!", 0 HW DD? EXTERN Message. Box. [email protected]: NEAR. CODE START: PUSH MB_OK PUSH OFFSET STR 1 PUSH OFFSET STR 2 PUSH HW CALL Message. Box. [email protected] RET END START

The INVOKE directive The MASM language translator also makes it possible to simplify the call of functions using a macro tool - the INVOKE directive: INVOKE function, parameter1, parameter2, ... In this case, there is no need to add @ 16 to the function call; the parameters are written in exactly the order in which they are given in the function description. the parameters are pushed onto the stack by the compiler macros. to use the INVOKE directive, you must have a description of the function prototype using the PROTO directive in the form: Message. Box. A PROTO: DWORD,: DWORD If a program uses many Win 32 API functions, it is advisable to use the directive include C: masm 32includeuser 32. inc

Course work

In the discipline "System programming"

Topic number 4: "Solving problems for procedures"

Option 2

EASTERN SIBERIAN STATE UNIVERSITY

TECHNOLOGY AND CONTROL

____________________________________________________________________

TECHNOLOGY COLLEGE

EXERCISE

for term paper

Discipline:

Topic: Solving problems for procedures

Performer (s): Glavinskaya Arina Aleksandrovna

Head: DambaevaSesegma Viktorovna

The summary of the work: the study of routines in assembly language,

problem solving using subroutines

1. Theoretical part: Basic information about the Assembly language (set

commands, etc.), Organization of subroutines, Methods of passing in parameters

in subroutines

2. Practical part: Develop two subroutines, one of which converts any given letter to uppercase (including for Russian letters), and the other converts the letter to lowercase.

converts any given letter to uppercase and the other converts the letter to lowercase.

converts a letter to lowercase.

Project deadlines on schedule:

1. Theoretical part - 30% by 7 weeks.

2. Practical part - 70% by 11 weeks.

3. Protection - 100% by 14 weeks.

Requirements for registration:

1. The settlement and explanatory note of the course project should be presented in

electronic and hard copies.

2. The volume of the report must be at least 20 typewritten pages excluding attachments.

3. RPZ is drawn up in accordance with GOST 7.32-91 and signed by the head.

Work manager __________________

Contractor __________________

Date of issue " 26 " september 2017 G.

Introduction. 2

1.1 Basic information about the Assembly language. 3

1.1.1 Command set. 4

1.2 Organization of subroutines in assembly language. 4

1.3 Methods of passing parameters in subroutines. 6

1.3.1 Passing parameters through registers .. 6

1.3.2 Passing parameters through the stack. 7

2 PRACTICAL SECTION .. 9

2.1 Statement of the problem. 9

2.2 Description of the solution to the problem. 9

2.3 Testing the program .. 7

Conclusion. eight

References .. 9

Introduction

It is common knowledge that it is difficult to program in Assembler. As you know, there are many different languages now. high level that allow you to spend much less effort when writing programs. Naturally, the question arises when a programmer may need to use Assembler when writing programs. Currently, there are two areas in which the use of assembly language is justified, and often necessary.

First, these are the so-called machine-dependent system programs, usually they control various devices on the computer (such programs are called drivers). These system programs use special machine instructions that do not need to be used in ordinary (or, as they say applied) programs. These commands are impossible or very difficult to define in a high-level language.

The second area of application of the Assembler is related to the optimization of program execution. Very often, translation programs (compilers) from high-level languages produce a very inefficient machine language program. This usually applies to programs of a computational nature, in which a very small (about 3-5%) section of the program (main loop) is executed most of the time. To solve this problem, so-called multilingual programming systems can be used, which allow you to write parts of the program in different languages. Usually, the main part of the program is written in a high-level programming language (Fortran, Pascal, C, etc.), and the time-critical sections of the program are written in Assembler. In this case, the speed of the entire program can be significantly increased. This is often the only way to get the program to produce a result in a reasonable amount of time.

The purpose of this course work is to obtain practical skills in programming in assembly language.

Work tasks:

1. To study the basic information about the Assembler language (structure and components of the Assembler program, the format of the commands, the organization of subroutines, etc.);

2. To study the types of bit operations, the format and logic of the logical instructions of the Assembler;

3. Solve an individual problem on the use of subroutines in Assembler;

4 .. Formulate a conclusion about the work done.

1 THEORETICAL SECTION

Understanding Assembly Language

Assembler is a low-level programming language that is a human-readable format for writing machine instructions.

Assembly language instructions correspond one-to-one with processor instructions and, in fact, represent a convenient symbolic form of notation (mnemonic code) of commands and their arguments. Also, assembly language provides basic software abstractions: linking parts of the program and data through labels with symbolic names and directives.

Assembler directives allow you to include data blocks (described explicitly or read from a file) into the program; repeat a certain fragment a specified number of times; compile a fragment conditionally; set the execution address of the fragment, change the values of the labels during the compilation process; use macros with parameters, etc.

Advantages and disadvantages

· The minimum amount of redundant code (using fewer instructions and memory accesses). As a result - higher speed and smaller size of the program;

· Large amounts of code, a large number of additional small tasks;

· Poor readability of the code, difficulty in maintaining (debugging, adding features);

· The difficulty of implementing programming paradigms and any other any more complex conventions, the complexity of joint development;

· Fewer available libraries, their low compatibility;

· Direct access to hardware: input-output ports, special registers of the processor;

· Maximum "fit" for the desired platform (use of special instructions, technical features of "hardware");

· Intolerance to other platforms (except binary compatible).

In addition to instructions, a program can contain directives: commands that do not translate directly into machine instructions, but control the operation of the compiler. Their set and syntax vary considerably and depend not on the hardware platform, but on the compiler used (giving rise to dialects of languages within the same family of architectures). As a set of directives, one can single out:

· Data definition (constants and variables);

· Management of the organization of the program in memory and the parameters of the output file;

· Setting the operating mode of the compiler;

· All kinds of abstractions (i.e. elements of high-level languages) - from the design of procedures and functions (to simplify the implementation of the procedural programming paradigm) to conditional structures and loops (for the structured programming paradigm);

· Macros.

Command set

Typical assembly language commands are:

Data transfer commands (mov, etc.)

Arithmetic commands (add, sub, imul, etc.)

Logical and bitwise operations (or, and, xor, shr, etc.)

· Commands to control the program execution (jmp, loop, ret, etc.)

Interrupt calling commands (sometimes referred to as control commands): int

I / O commands to ports (in, out)

For microcontrollers and microcomputers, commands are also characteristic that perform check and transition by condition, for example:

· Jne - move if not equal;

· Jge - jump if greater or equal.

The main components of the assembly language and the structure of commands. General characteristics of the command system of the Assembler language for IBM-PC (basic set of commands, basic methods of addressing operands)

Top related articles