Thoughts on assembly language programming. Dive into assembler

16.06.2019 Windows 10

Today there are a huge number of high-level programming languages. Against their background, programming in a low-level language - assembly language - may at first glance seem to be something outdated and irrational. However, this only seems to be. It should be recognized that assembler is actually a processor language, which means that it cannot be dispensed with as long as processors exist. The main advantages of programming in assembler are the maximum speed and the minimum size of the resulting programs.

Disadvantages are often due only to the tendency of the modern market to prefer quantity to quality. Modern computers are able to easily cope with the heap of commands of high-level functions, and if it is not easy - please upgrade the hardware of your machine! This is the law of commercial programming. If we are talking about programming for the soul, then a compact and nimble program written in assembler will leave a much more pleasant impression than a high-level hulk burdened with a bunch of unnecessary operations. There is an opinion that only a select few can program in assembler. It is not true. Of course, talented assembler programmers can be counted on the fingers, but this is the case in almost any field of human activity. There are not so many aces drivers, but everyone will be able to learn how to drive a car - there would be a desire. After reading this series of articles, you will not become a cool hacker. However, you will get an overview and learn simple ways to program in assembly language for Windows using its built-in functions and compiler macro instructions. Naturally, in order to learn Windows programming, you need to have Windows skills and experience. At first, a lot of things will not be clear to you, but do not be discouraged by this and read on: over time, everything will fall into place.

So, in order to start programming, we will at least need a compiler. A compiler is a program that translates source code written by a programmer into machine code executable by a processor. The bulk of assembler textbooks focus on using the MASM32 (Microsoft Macro Assembler) package. But in the form of variety and for a number of other reasons, I will introduce you to the young FASM (Flat Assembler) compiler, which is rapidly gaining popularity. This compiler is quite easy to install and use, is compact and fast, has a rich and capacious macro syntax that allows you to automate many routine tasks. You can download its latest version at: site by choosing flat assembler for Windows. To install FASM, create a folder, for example, "D:\FASM" and extract the contents of the downloaded zip archive into it. Run FASMW.EXE and close without changing anything. By the way, if you are using a standard explorer and the file extension is not displayed (for example, .EXE), I recommend that you go to Tools -> Folder Options -> View and uncheck the Hide extensions for registered file types checkbox. After the first launch of the compiler, a configuration file should appear in our folder - FASMW.INI. Open it with a standard notepad and add 3 lines at the very bottom:

Fasminc=D:\FASM\INCLUDE
Include=D:\FASM\INCLUDE

If you unpacked FASM to another location, replace "D:\FASM\" with your path. Save and close FASMW.INI. Looking ahead, I will briefly explain how we will use the compiler:
1. We write the text of the program, or open the previously written text saved in the .asm file, or paste the text of the program from the clipboard with a combination.
2. Press F9 to compile and run the program, or Ctrl+F9 to compile only. If the text of the program has not yet been saved, the compiler will ask you to save it before compiling.
3. If the program has started, we test it for correct operation, if not, we look for errors, the compiler will point out the grossest of which or subtly hint to us.
Well, now we can start the long-awaited practice. We launch our FASMW.EXE and type in it the code of our first program:

Include "%fasminc%/win32ax.inc"

Data
Caption db "My first program.",0
Text db "Hello everyone!",0

Code
start:

invoke ExitProcess,0

Press Run -> Run, or F9 on your keyboard. In the save window, specify the file name and folder to save. It is advisable to get used to saving each program in a separate folder so as not to get confused in the future, when each program may have a bunch of files: pictures, icons, music, and so on. If the compiler generated an error, carefully double-check the line it specified - maybe a comma was missed or a space. You also need to be aware that the compiler is case sensitive, so .data and .data are treated as two different instructions. If you did everything right, then the result will be a simple MessageBox (Fig. 1). Now let's figure out what we wrote in the text of the program. In the first line, with the include directive, we included a large text from several files in our program. Remember, during installation, we wrote 3 lines in the Fasmov ini-file? Now %fasminc% in the program text means D:\FASM\INCLUDE or the path you specified. The include directive, as it were, inserts text from another file at the specified location. Open the WIN32AX.INC file in the include folder using notepad or in the fasma itself and make sure that we have automatically included (attached) to our program also the text from win32a.inc, macro/if.inc, a bunch of incomprehensible (so far) macro instructions and common set of Windows function libraries. In turn, each of the included files may contain several more included files, and this chain may go beyond the horizon. With the help of include files, we organize a kind of high-level language: in order to avoid the routine of describing each function manually, we include entire libraries for describing standard Windows functions. Is all this really necessary for such a small program? No, this is something like a "gentleman's set for all occasions." Real hackers, of course, do not connect everything in a row, but we are only learning, so this is forgivable for us for the first time.

Next, we have a data section - .data. In this section, we declare two variables - Caption and Text. These are not special commands, so their names can be changed as you like, even a and b, as long as there are no spaces and not in Russian. Well, you can’t call variables with reserved words, for example, code or data, but you can code_ or data1. The db command means "define byte". Of course, all this text will not fit in one byte, because each individual character occupies a whole byte. But in this case, with this command, we define only a pointer variable. It will contain the address where the first character of the string is stored. The text of the string is indicated in quotation marks, and you can optionally put quotation marks "such" and "such" - as long as the initial quote is the same as the final one. The zero after the comma adds a null byte to the end of the string, which marks the end of the string (null-terminator). Try to remove this zero along with the comma in the first line and see what you get. In the second line in this particular example, you can do without a zero (we delete it along with a comma - otherwise the compiler will indicate an error), but this will work only because in our example, the next section begins immediately after the second line, and before it begins, the compiler will automatically enter a bunch of zeros that justify the previous section. In general, zeros at the end of text strings are required! The next section is the section of the program's executable code - .code. At the beginning of the section is the label start:. It means that it is from this place that our program will begin to be executed. The first command is the invoke macro instruction. It calls the built-in Windows API function MessageBox. API functions (application programming interface) significantly simplify the work in the operating system. We kind of ask the operating system to perform some standard action, and it performs and returns to us the result of the work done. The function name is followed by its parameters separated by commas. The MessageBox function has the following parameters:

The 1st parameter must contain the handle of the owner window. A handle is something like a personal number that is issued by the operating system to each object (process, window, etc.). 0 in our example means that the window has no owner, it is on its own and does not depend on any other windows.
The 2nd parameter is a pointer to the address of the first letter of the message text ending with the aforementioned null terminator. To clearly understand that this is just an address, let's shift this address by 2 bytes directly in the function call: invoke MessageBox,0,Text+2,Caption,MB_OK and make sure that now the text will be displayed without the first two letters.
3rd - pointer to the address of the first letter of the message header.
4th - message style. You can find a list of these styles, for example, in INCLUDE\EQUATES\ USER32.INC. To do this, you'd better use Notepad's search to quickly find MB_OK and the rest. There, unfortunately, there is no description, but from the name of the style you can usually guess about its purpose. By the way, all these styles can be replaced by a number, meaning one or another style or a combination of them, for example: MB_OK + MB_ICONEXCLAMATION. USER32.INC contains hexadecimal values. You can use them as such or convert them to decimal in the engineering mode of the standard Windows Calculator. If you are not familiar with number systems and do not know how decimal differs from hexadecimal, then you have 2 options: either familiarize yourself with this matter on the Internet / textbook / ask a friend, or leave this idea until better times and try to do without this information. Here I will not give even brief information on number systems, in view of the fact that even without me a huge number of articles and pages of any conceivable level have been written about them.

Let's get back to our sheep. Some styles cannot be used at the same time - for example, MB_OKCANCEL and MB_YESNO. The reason is that the sum of their numerical values (1+4=5) will correspond to the value of another style - MB_RETRYCANCEL. Now play around with the function parameters to practice the stuff, and we're moving on. The MessageBox function pauses program execution and waits for user action. When completed, the function returns the result of the user's action to the program, and the program continues to run. Calling the ExitProcess function terminates the process of our program. This function has only one parameter - the exit code. Usually, if the program exits normally, this code is zero. To better understand the last line of our code - .end start - take a close look at the equivalent code: format PE GUI 4.0

include "%fasminc%/win32a.inc"

section ".data" data readable writeable

Caption db "Our first program.",0
Text db "FASM assembler is easy!",0

section ".code" code readable executable
start:
invoke MessageBox,0,Text,Caption,MB_OK
invoke ExitProcess,0

section ".idata" import data readable writeable
library KERNEL32, "KERNEL32.DLL",\
USER32, "USER32.DLL"

import KERNEL32,\
ExitProcess, "ExitProcess"

import USER32,\
MessageBox, "MessageBoxA"

For the compiler, it is almost identical to the previous example, but for us this text looks like a different program. I specifically gave this second example so that you get an idea of the use of macro instructions at the very beginning and henceforth, moving from one included file to another, independently get to the true code of the program hidden under the cover of macros. Let's try to understand the differences. The very first, not very conspicuous, but worthy of special attention, is that we include not win32ax, but only win32a in the program text. We have abandoned the large set and limit ourselves to the small. We will try to do without connecting everything from win32ax, although we still need some of it for now. Therefore, in accordance with the macros from win32ax, we manually write down some definitions. For example, a macro from the win32ax file:
macro .data ( section ".data" data readable writeable )

automatically replaces .data with section ".data" data readable writeable at compile time. Since we have not included this macro in the program text, we need to write the detailed definition of the section ourselves. By analogy, you can find the reasons for other modifications of the program text in the second example. Macros help you avoid routine when writing large programs. Therefore, you just need to immediately get used to them, and you will love them later =). Try to figure out the differences between the first and second examples yourself, using the text of the macros used in the win32ax. Let me just say that in quotes you can specify any other name of the data or code section - for example: section "virus" code readable executable. This is just a section name and is not a command or statement. If you understand everything, then you can already write your own virus. Believe me, it's very easy. Just change the title and text of the message:
Caption db "Dangerous Virus.",0

Text db "Hello, I am a particularly dangerous trojan virus and spread over the Internet.",13,\
"Since my author can't write malicious viruses, you must help me.",13,\
"Please do the following:",13,\
"1.Erase the directories C:\Windows and C:\Program files",13,\ on your disk
"2.Send this file to all your friends",13,\
"Thanks in advance.",0

The number 13 is the code for the carriage return character in Microsoft systems. The \ sign is used in the FASM syntax to combine several lines into one, without it the line would be too long, extending beyond the edge of the screen. For example, we can write start: or we can also write st\
ar\
t:

The compiler will not notice the difference between the first and second options.
Well, for more courage in our "virus", you can replace MB_OK with MB_ICONHAND or simply with the number 16. In this case, the window will have the style of an error message and will have a more impressive effect on the victim of the "infection" (Fig. 2).

That's all for today. I wish you success and see you soon!
All examples provided have been tested to work correctly under Windows XP and will most likely work under other versions of Windows, however I make no guarantee that they will work correctly on your computer. You can find the source texts of the programs on the forum.

Assembly Language Programming

This part of the course covers the basics of assembly language programming for the Win32 architecture.

All processes in the machine at the lowest, hardware level are driven only by commands (instructions) of the machine language. Assembly language is a symbolic representation of machine language.. Assembler allows you to write short and fast programs. However, this process is extremely labor intensive. To write the most efficient program, you need a good knowledge of the features of assembly language commands, attention and accuracy. Therefore, in reality, programs are written in assembly language, which should ensure efficient work with the hardware. Also, in assembly language, sections of the program that are critical in terms of execution time or memory consumption are written. Subsequently, they are made in the form of subroutines and combined with code in a high-level language.

1. Registers

Registers are special memory locations located directly in the processor. Working with registers is much faster than working with RAM cells, so registers are actively used both in assembly language programs and high-level language compilers.

Registers can be divided into general purpose registers,command pointer,flag register and segment registers.

1.1. General purpose registers

To General purpose registers are a group of 8 registers that can be used in an assembly language program. All registers are 32 bits in size and can be divided into 2 or more parts.

As can be seen from the figure, the ESI, EDI, ESP and EBP registers allow you to access the lower 16 bits by the names SI, DI, SP and BP, respectively, and the EAX, EBX, ECX and EDX registers allow you to access both the lower 16 bits (by the names AX , BX, CX and DX), and to the two least significant bytes separately (by the names AH/AL, BH/BL, CH/CL and

The names of the registers come from their purpose:

EAX/AX/AH/AL (accumulator register) – accumulator;

EBX/BX/BH/BL (base register) - base register;

ECX/CX/CH/CL (counter register) – counter;

EDX / DX / DH / DL (data register) - data register;

ESI/SI (source index register) – source index;

EDI / DI (destination index register) - index of the receiver (recipient);

ESP / SP ( stack pointer register ) - stack pointer register;

EBP/BP (base pointer register) – stack frame base pointer register.

Despite the existing specialization, all registers can be used in any machine operations. However, one must take into account the fact that some commands work only with certain registers. For example, the multiplication and division instructions use the EAX and EDX registers to store the original data and the result of the operation. The loop control instructions use the ECX register as the loop counter.

Another nuance is the use of registers as a base, i.e. memory address storage. Any registers can be used as base registers, but it is desirable to use EBX, ESI, EDI or EBP registers. In this case, the size of the machine instruction is usually smaller.

Unfortunately, the number of registers is catastrophically small, and it is often difficult to find a way to use them optimally.

1.2. Instruction pointer

EIP register ( command pointer) contains the offset of the next instruction to be executed. This register is not directly accessible to the programmer, but its value is loaded and changed by various control commands, which include commands for conditional and unconditional jumps, calling procedures, and returning from procedures.

1.3. Flag register

A flag is a bit that is set to 1 ("flag is set") if some condition is met and 0 ("flag is cleared") otherwise. The processor has a flags register containing a set of flags that reflect the current state of the processor.

Designation

Name

Relocated flag

								Reserved
								Reserved


								Parity Flag


								Reserved
								Reserved


					Auxiliary Carry Flag			Auxiliary


								Reserved
								Reserved


								Zero Flag


								sign flag


								tracer flag


					Interrupt Enable Flag			Permission flag


								Flag to the right


								repo flag



					I/O Privilege Level			Level at





								Flag nested


								Reserved
								Reserved





								flag resume


					Virtual-8086 Mode			virtual mode


								Checking you


					Virtual Interrupt Flag			Virtual


					Virtual Interrupt Pending			pending


								Check for





								Reserved

The value of the flags CF, DF, and IF can be changed directly in the flag register using special instructions (for example, CLD to reset the direction flag), but there are no instructions that allow you to access the flag register as a normal register. However, you can save

flag register to the stack or register AH and restore the flag register from them using the instructions LAHF , SAHF , PUSHF , PUSHFD , POPF and POPFD .

1.3.1. Status flags

The status flags (bits 0, 2, 4, 6, 7 and 11) reflect the result of executing arithmetic instructions such as ADD ,SUB ,MUL ,DIV .

The carry flag CF is set on a carry from the most significant bit/borrow to the most significant bit and indicates the presence of an overflow in unsigned integer arithmetic. Also used in long arithmetic.

The parity flag PF is set if the least significant byte of the result contains an even number of 1 bits. Initially, this flag was oriented to use in communication programs: when transmitting data over communication lines, a parity bit could also be transmitted for control, and instructions for checking the parity flag made it easier to check the integrity of the data.

Auxiliary carry flag AF is set on carry from bit 3rd result/loan in 3rd result bit. This flag is intended for use in binary decimal (binary coded decimal, BCD) arithmetic.

The zero flag ZF is set if the result is zero.

The sign flag SF is equal to the value of the most significant bit of the result, which is the sign bit in signed arithmetic.

overflow flag OF is set if the integer result is too long to fit in the target operand (register or memory location). This flag indicates the presence of an overflow in signed integer arithmetic.

Of the listed flags, only the CF flag can be changed directly using the STC, CLC, and CMC instructions.

Status flags allow the same arithmetic instruction to produce a result of three different types: unsigned, signed, and binary coded decimal (BCD) integer. If the result is an unsigned number, then the CF flag shows the overflow condition (carry or borrow), for a signed result, carry or borrow shows the OF flag, and for a BCD result, carry/borrow shows the AF flag. The SF flag reflects the sign of a signed result, the ZF flag reflects both an unsigned and a signed null result.

In long integer arithmetic, the CF flag is used in conjunction with the add-and-carry (ADC) and subtract-borrow (SBB) instructions to propagate a carry or borrow from one computed bit of a long number to another.


Instructions	conditional	Jcc transition (transition
condition cc ),SETcc (set		meaning	result-byte
dependencies	conditions cc ),LOOPcc (organization
and CMOVcc (conditional	copy)	use	one or more

status flags to test the condition. For example, the JLE jump instruction (jump if less or equal) tests the condition "ZF = 1 or SF ≠ OF".

The PF flag was introduced for compatibility with other microprocessor architectures and is rarely used for its intended purpose. It is more common to use it in conjunction with the rest of the status flags in floating point arithmetic: the comparison instructions (FCOM , FCOMP , etc.) in the math coprocessor set the condition flags C0, C1, C2 and C3 in it, and these flags can be copied to flag register. To do this, it is recommended to use the FSTSW AX instruction to store the coprocessor status word in the AX register and the SAHF instruction to subsequently copy the contents of the AH register into the lower 8 bits of the flag register, while C0 goes to the CF flag, C2 to PF, and C3 to ZF. The C2 flag is set, for example, in the case of incomparable arguments (NaN or unsupported format) in the FUCOM comparison instruction.

1.3.2. control flag

Direction Flag DF (bit 10 in the flag register) controls string instructions (MOVS , CMPS , SCAS , LODS and STOS ) - setting the flag causes addresses to decrease (process lines from high addresses to low ones), zeroing makes addresses increase. The STD and CLD instructions set and clear the DF flag, respectively.

1.3.3. System flags and IOPL field

The system flags and the IOPL field control the operating environment and are not intended to be used in application programs.

Interrupt enable flag IF - clearing this flag disables responses to maskable interrupt requests.

Trace Flag TF - setting this flag enables step-by-step debugging mode, when after each executed

instructions, the program is interrupted and a special interrupt handler is called.

The IOPL field indicates the I/O priority level of an executing program or task: in order for a program or task to execute I/O instructions or change the IF flag, its current priority level (CPL) must be ≤ IOPL.

Task nesting flag NT - This flag is set when the current task is nested within another interrupted task, and the current task's TSS status segment provides feedback to the previous task's TSS. The NT flag is checked by the IRET instruction to determine if the return type is intertask or intratask.

Resume Flag RF is used to mask debug errors.

VM - setting this flag in protected mode causes a switch to virtual 8086 mode.

Alignment check flag AC - setting this flag together with the AM bit in the CR0 register enables operand alignment control during memory accesses: accessing an unaligned operand causes an exception.

VIF is a virtual copy of the IF flag; used in conjunction with the VIP flag.

VIP - set to indicate the presence of a pending interrupt.

ID - the ability to programmatically change this flag in the flag register indicates support for the CPUID instruction.

1.4. segment registers

The processor has 6 so-called segment registers: CS, DS, SS, ES, FS and GS. Their existence is due to the specifics of the organization and use of RAM.

16-bit registers could only address 64 KB of RAM, which is clearly not enough for a more or less decent program. Therefore, the program memory was allocated in the form of several segments, which had a size of 64 KB. At the same time, absolute addresses were 20-bit, which made it possible to address already 1 MB of RAM. The question arises - how, having 16-bit registers, to store 20-bit addresses? To solve this problem, the address was broken down into the offset base. The base is the address of the beginning of the segment, and the offset is the byte number within the segment. A restriction was imposed on the address of the beginning of the segment - it had to be a multiple of 16. In this case, the last 4 bits were equal to 0 and were not stored, but implied. Thus, two 16-bit parts of the address were obtained. For getting

absolute address, four zero bits were added to the base, and the resulting value was added with an offset.

Segment registers were used to store the address of the beginning of the code segment (CS - code segment), data segment (DS - data segment) and stack segment (SS - stack segment). Registers ES, FS and GS were added later. There were several memory models, each of which implied the allocation of one or more code segments and one or more data segments to the program: tiny , small , medium , compact , large and huge . For assembly language instructions, there were certain conventions: jump addresses were segmented by the CS register, data accesses were segmented by the DS register, and stack accesses were segmented by the SS register. If the program was allocated several segments for code or data, then the values in the CS and DS registers had to be changed to access another segment. There were so-called "near" and "distant" transitions. If the command to jump to was in the same segment, then only the value of the IP register needed to be changed to jump. Such a transition was called a near one. If the command to which the transition must be made was in another segment, then for the transition it was necessary to change both the value of the CS register and the value of the IP register. Such a transition was called far and was carried out longer.

32-bit registers allow you to address 4 GB of memory, which is already enough for any program. Windows runs each Win32 program in a separate virtual space. This means that each Win32 program will have 4 gigabytes of address space, but it does not mean that each program has 4 gigabytes of physical memory, only that the program can access any address within that range. And Windows will do everything necessary to ensure that the memory that the program accesses "exists." Of course, the program must adhere to the rules set by

Windows, otherwise a General Protection Fault error occurs.

Under the Win32 architecture, there was no need to separate the address into base and offset, and no need for memory models. At 32-

are used differently. Previously, it was necessary to associate individual parts of the program with one or another segment register and save / restore the DS register when moving to another data segment or explicitly segment the data according to another register. With a 32-bit architecture, this is no longer necessary, and in the simplest case, you can forget about segment registers.

1.5. Stack usage

Every program has an area of memory called a stack. The stack is used to pass parameters to procedures and to store local data for procedures. As you know, the stack is a memory area, when working with which it is necessary to follow certain rules, namely: the data that hit the stack first is retrieved from there last. On the other hand, if some memory is allocated to the program, then there are no physical restrictions on reading and writing. How do these two conflicting principles fit together?

Let's say we have a function f1 that calls a function f2 , and a function f2 in turn calls a function f3 . When the function f1 is called, it is allocated a certain place on the stack for local data. This space is allocated by subtracting from the ESP register a value equal to the size of the required memory. The minimum size of allocated memory is 4 bytes, i.e. even if the procedure needs 1 byte, it must take 4 bytes.

Function f1 performs some actions and then calls

function f2 calls function f3 , which also allocates itself on the stack. The f3 function does not call other functions, and at the end of its work it must free up space on the stack by adding to the ESP register the value that was subtracted when the function was called. If the f3 function does not restore the value of the ESP register, then the f2 function, continuing its work, will not access its own data, because She is searching

which was before her call.

Thus, at the level of procedures, it is necessary to follow the rules for working with the stack - the procedure that took the last place on the stack must free it first. If this rule is not observed, the program will not work correctly. But each procedure can access its stack area in an arbitrary way. If we were forced to follow the rules for working with the stack inside each procedure, we would have to shift the data from the stack to another memory area, and this would be extremely inconvenient and would extremely slow down the execution of the program.

Every program has a data area where global variables are placed. Why is local data stored on the stack? This is done to reduce the amount of memory occupied by the program. If the program will successively call several procedures, then at each moment of time there will be space only for the data of one procedure, because the stack is occupied and deallocated. The data area exists all the time the program is running. If local data were placed in the data area, there would be room for local data for all the procedures in the program.

Local data is not automatically initialized. If in the above example, the function f2 after the function f3 calls the function f4 , then the function f4 will take the place on the stack that was previously occupied by the function f3 , thus, the function f4 will “inherit” the data of the function f3 . Therefore, each procedure must take care of the initialization of its local data.

2. Basic concepts of assembly language

2.1. Identifiers

The concept of an identifier in assembly language is no different from the concept of an identifier in other languages. You can use latin letters, numbers and signs _ . ? @ $ , where the dot can only be the first character of the identifier. Uppercase and lowercase letters are considered equivalent.

2.2. Whole numbers

AT In an assembly language program, integers can be written in binary, octal, decimal, and hexadecimal number systems. To set the number system at the end of a number

After many years of doing nothing, I decided to return to the roots. To programming. Again, in view of the many "modern achievements" in this area, it was difficult to decide what was really missing, what to take in order to be both pleasant and useful. After trying a little bit of a lot of things, I decided to return to where I was drawn from the first days of my acquaintance with a computer (still with a copy of Sir Sinclair's creation) - to programming in assembler. In fact, at one time I knew Assembler quite well (in this case, I'm talking about x86), but I haven't written anything in it for almost 15 years. Thus, this is a kind of return of the "prodigal son."
But here the first disappointment awaited. The books, manuals and other reference books on assembler found on the Internet, to my deep regret, contain a minimum of information on how to program in assembler, why it is so, and what it gives.

An example from another area

If we take boxing as an example, then all such manuals teach how to execute a blow, move while standing on the floor, but there is absolutely no thing that makes boxing - boxing, and not "allowed punching". That is, combinational work, features of using the ring, defensive actions, tactical construction of the battle and, moreover, the battle strategy are not considered at all. They taught a person to hit the "pear" and immediately into the ring. This is fundamentally wrong. But this is how almost all "textbooks" and "manuals" on programming in assembler are built.

However, normal books should be, most likely under the mountain of "slag" I simply did not find them. Therefore, before replenishing knowledge with a global description of architecture, mnemonics and all sorts of tricks “how to make a figure out of 2 fingers”, let's approach the issue of programming in assembler from an “ideological” point of view.

Idyll?

A small note, further in the text a classification that differs from the currently common one will be used. However, this is not a reason for "disputes about the color of truth", it's just that in this form it is easier to explain the author's point of view on programming.

So, today, it would seem, the era of happiness has come for programmers. A huge selection of funds for all occasions and wishes. Here you have millions of “frameworks” / “patterns” / “templates” / “libraries” and thousands of tools that “facilitate” programming, hundreds of languages and dialects, dozens of methodologies and various approaches to programming. Take it - I don't want to. But it is not "taken". And it's not about religious beliefs, but that it all looks like an attempt to eat something tasteless. With desire and diligence, you can adapt to this, of course. But, returning to programming, in most of what is proposed, technical beauty is not visible - only a lot of “crutches” are visible. As a result, when using these “achievements”, instead of bewitching landscapes, instead of bewitching landscapes, a solid “abstraction” or popular prints come out from under the “brush of artists” - if you're lucky. Are most programmers such mediocrity, ignoramuses and have problems at the level of genetics? No I do not think so. So what's the reason?
Today there are many ideas and ways of programming. Consider the most "fashionable" of them.

Imperative programming - in this approach, the programmer sets the sequence of actions leading to the solution of the problem. It is based on the division of the program into parts that perform logically independent operations (modules, functions, procedures). But unlike the typed approach (see below), there is an important feature here - the lack of “typing” of variables. In other words, there is no concept of “variable type”, instead, the understanding is used that the values of the same variable can have a different type. A prominent representative of this approach are Basic, REXX, MUMPS.
Typed programming is a modification of imperative programming, when the programmer and the system limit the possible values of variables. The most famous languages are Pascal, C.
Functional programming is a more mathematical way of solving a problem, where the solution is to "construct" a hierarchy of functions (and therefore create missing ones) leading to a solution to the problem. As examples: Lisp, Forth.
Automated programming is an approach where a programmer builds a model/network consisting of objects/executive elements exchanging messages, both changing/storing their internal “state” and able to interact with the outside world. In other words, this is what is usually called "object programming" (not object-oriented). This way of programming is represented in Smalltalk.

What about many other languages? As a rule, these are already “mutants”. For example, mixing the typed and automaton approach gave "object-oriented programming".

As you can see, each of the approaches (even without taking into account the limitations of specific implementations) imposes its own restrictions on the programming technique itself. But it cannot be otherwise. Unfortunately, these restrictions are often artificially created to "maintain the purity of the idea." As a result, the programmer has to “pervert” the initially found solution into a form that somehow corresponds to the ideology of the language used or the “template” used. This is even without taking into account newfangled methods and methods of design and development.

It would seem that when programming in assembler, we are free to do everything and in such a way as we wish and the hardware allows us. But as soon as we want to use a "universal driver" for some type of equipment, we are forced to change the freedom of "creativity" to prescribed (standardized) approaches and ways to use the driver. As soon as we need the opportunity to use the achievements of other colleagues or give them the opportunity to do the same with the fruits of our labor, we are forced to change the freedom of choice of interaction between parts of the program to some negotiated / standardized ways.

Thus, the “freedom” for which assembler is often torn is often a “myth”. And this (understanding the limitations, and ways to organize them), in my opinion, should be given increased attention. The programmer must understand the reason for the restrictions being introduced, and, which distinguishes assembler from many high-level languages, be able to change them if the need arises. However, today the assembler programmer is forced to put up with the limitations imposed by high-level languages, not having the "carrots" available to programmers in them. On the one hand, operating systems provide a lot of already implemented functions, there are ready-made libraries and much more. But the ways of using them, as specially, are implemented without taking into account calling them from programs written in assembler, or even contrary to the programming logic for the x86 architecture. As a result, assembly language programming with calls to OS functions or external libraries of high-level languages is now “fear” and “horror”.

The further into the forest, the thicker

So, we realized that although the assembler is very simple, you need to be able to use it. And the main coherence is the need to interact with the execution environment where our program is launched. If programmers in high-level languages already have access to the necessary libraries, functions, subroutines for many occasions, and they have access to ways of interacting with the outside world in a form consistent with the idea of a language, then an assembler programmer has to wade through a thicket of various obstacles placed on empty place. When you look at what high-level languages generate when compiled, you get the feeling that those who wrote compilers either have no idea how an x86 processor works, “or one of the two” (c).

So, let's go in order. Programming is, first of all, engineering, that is, scientific creativity aimed at an effective (in terms of reliability, use of available resources, implementation time and ease of use) solution of practical problems. And, at the heart of any engineering is a systematic approach. That is, you can not consider any solution as a kind of "non-separable" black box, functioning in a complete and ideal vacuum.

Another example from another area

As a vivid example of a systematic approach, one can cite the production of trucks in the USA. In this case, the truck manufacturer is simply the frame and cab maker + the constructor assembler. Everything else (engine, transmission, suspension, electrical equipment, etc.) is taken based on the wishes of the customer. One customer wanted to get himself a kind of Kenworth with a Detroit Diesel engine, a Fuller manual gearbox, a spring suspension from some Dana - please. A friend of this customer needed the same Kenworth model, but with the “native” Paccar engine, Allison automatic gearbox and air suspension from another manufacturer - easy! And so do all truck assemblers in the US. That is, a truck is a system in which each module can be replaced by another, of the same purpose and seamlessly docked with existing ones. Moreover, the method of docking modules is made with the maximum available versatility and convenience of further expansion of functionality. This is what an engineer should aim for.

Unfortunately, we will have to live with what we have, but in the future this should be avoided. So, a program is, in fact, a set of modules (it doesn’t matter what they are called and how they “behave”), by composing which we achieve a solution to the problem at hand. For efficiency, it is highly desirable that these modules be reusable. And not just use it at any cost, but use it in a convenient way. And here we are waiting for another unpleasant "surprise". Most high-level languages operate with such structural units as "function" and "procedure". And, as a way to interact with them, "parameter passing" is used. This is quite logical, and there are no questions here. But as always, “it is not what is done that is important – it is important how it is done” (c). And here the most incomprehensible begins. Today, there are 3 common ways to organize the transfer of parameters: cdecl, stdcall, fast call. So, none of these methods are "native" for x86. Moreover, all of them are flawed in terms of expanding the functionality of the called subroutines. That is, by increasing the number of parameters passed, we are forced to change all the call points of this function / subroutine, or produce a new subroutine with similar functionality, which will be called in a slightly different way.

The above methods of passing parameters work relatively well on processors with 2 separate stacks (data stack and address/control stack) and advanced stack manipulation instructions (at least index access to stack elements). But when programming on x86, you first have to pervert when passing / receiving parameters, and then do not forget their “structural” removal from the stack. Along the way, trying to guess / calculate the maximum depth of the stack. Recall that x86 (16/32 bit mode) is a processor that has:

specialized registers (RONs - general-purpose registers - as such are absent: that is, we cannot multiply the contents of the GS register by the value from EDI with one command and get the result in the EDX: ECX pair, or divide the value from the EDI: ESI register pair by the contents register EAX);
few registers;
one stack;
a memory location does not give any information about the type of the value stored there.

In other words, programming methods used for processors with a large register file, support for multiple independent stacks, etc. mostly not applicable when programming on x86.

The next feature of interaction with ready-made modules written in "high-level languages" is the "struggle" with "variable types". On the one hand, the reason for the appearance of variable types is clear - the programmer knows what values are used inside his subroutine / module. Based on this, it seems quite logical that by setting the type of the values of the variable, we can "simplify" the writing of the program by entrusting the control of types/value limits to the language translator. But even here the baby was thrown out with the water. Because any program is written not to generate spherical horses in a vacuum, but for practical work with user data. That is an obvious violation of the systems approach - as if the developers of high-level languages considered their systems without considering interaction with the outside world. As a result, when programming in a typed language, the developer must anticipate all possible types of “wrong” input data, and look for ways to work around uncertainties. And this is where monstrous regular expression support systems, exception handling, method/procedure signatures for different value types, and other crutch generation come into play.

As mentioned above, for the x86 architecture, the value itself stored in a memory cell does not have any type. The assembler programmer gets the privilege and a responsibility for determining how this very value is handled. And how to determine the type of value and how to process it - there are many options to choose from. But, we emphasize again, all of them concern only the values received from the user. As the developers of typed languages rightly noted: the types of values of internal and auxiliary variables are almost always known in advance.

This reason (the perverted passing of parameters to modules written in high-level languages and the need to strictly monitor the types of parameters passed to the same modules) seems to be the main one, due to which programming in assembler is unreasonably difficult. And the majority prefers to understand the wilds of "high-level languages" in order to take advantage of what others have already developed, rather than suffer, inserting the same "typical" crutches to correct what they did not do. And a rare assembler translator somehow “unloads” the programmer from this routine.

What to do?

Preliminary conclusions, taking into account the 15-year break in assembly language programming.
First, about the modules or parts of the program. In the general case, it is worth distinguishing two types of executive modules of an assembly language program - “operation” and “subroutine”.

An "operation" is a module that performs an "atomic" action and does not require many parameters for its execution (for example, the operation of clearing the entire screen, or the operation of calculating the median of a number series, etc.).
A “subprogram” should be called a functional module that requires, for correct functioning, a lot of input parameters (more than 2x-3x).

And here it is worth evaluating the experience of imperative and functional languages. They gave us 2 valuable tools that are worth using: “data structure” (or, in the example of REXX, compound / complemented variables) and “data immutability”.

It is also useful to follow the rule of immutability - that is, the immutability of the parameters passed. The subroutine cannot (should not) change the values in the structure passed to it, and the result is returned either in registers (no more than 2-3 parameters), or also in a new, created structure. Thus, we are relieved of the need to make copies of structures in case of a “forgotten” data change by subroutines, and we can use the entire structure already created or its main part to call several subroutines operating with the same / similar set of parameters. Moreover, almost "automatically" we come to the next "functional" rule - the internal context-independence of subroutines and operations. In other words, to the separation of the state / data from the method / subroutine of their processing (in contrast to the automaton model). In cases of parallel programming, as well as sharing one subroutine, we get rid of both the need to produce many execution contexts and monitor their “non-intersection”, and from creating many instances of one subroutine with different “states”, in case of several of its calls.

As for the “types” of data, here you can either leave “everything as it is”, or you can also not reinvent the wheel and use what developers of imperative languages translators have been using for a long time - “value type identifier”. That is, all data coming from the outside world is analyzed and each received value is assigned an identifier of the processed type (integer, floating point, packed BCD, character code, etc.) and the size of the field / value. Having this information, the programmer, on the one hand, does not drive the user into an unnecessarily narrow framework of the "rules" for entering values, and on the other hand, he has the opportunity to choose the most efficient way of processing user data in the course of work. But, I repeat once again, this only applies to working with user data.

These were general considerations about assembly language programming, not related to design, debugging, and error handling. I hope that OS developers who write them from scratch (and even more so in assembler) will have something to think about and they will choose (even if not described above, but any other) ways to make assembly language programming more systematized, convenient and enjoyable, and will not blindly copy someone else's, often hopelessly "crooked" options.

Assembler (Assembly) - a programming language, the concepts of which reflect the architecture of an electronic computer. Assembly language is a symbolic form of writing machine code, the use of which simplifies the writing of machine programs. For the same computer, different assembly languages can be developed. Unlike high-level abstraction languages, in which many algorithm implementation problems are hidden from developers, assembly language is closely related to the microprocessor instruction set. For an ideal microprocessor, in which the instruction set exactly corresponds to the programming language, the assembler produces one machine code for each statement of the language. In practice, real microprocessors may require multiple machine instructions to implement a single language statement.

The assembly language provides access to registers, specifying addressing methods, and describing operations in terms of processor instructions. The assembly language may contain tools of a higher level of abstraction: built-in and defined macros corresponding to several machine instructions, automatic instruction selection depending on the types of operands, tools for describing data structures. The main advantage of the assembly language is its “proximity” to the processor, which is the basis of the computer used by the programmer, and the main inconvenience is the too small division of typical operations, which is difficult for most users to perceive. However, assembly language reflects the very functioning of the computer to a much greater extent than all other languages.

And although drivers and operating systems are now written in C, C, with all its advantages, is a high-level abstraction language that hides various subtleties and nuances of hardware from the programmer, and assembler is a low-level abstraction language that directly reflects all these subtleties and nuances.

To successfully use assembler, three things are necessary at once:

knowledge of the syntax of the assembler translator that is used (for example, the syntax of MASM, FASM and GAS is different), the assignment of assembly language directives (operators processed by the compiler during the translation of the source code of the program);
understanding of the machine instructions executed by the processor during the execution of the program;
ability to work with services provided by the operating system - in this case, this means knowledge of the Win32 API functions. When working with high-level languages, very often the programmer does not directly access the system API; he may not even be aware of its existence, since the language library hides system-specific details from the programmer. For example, in Linux, and in Windows, and in any other system in a C / C ++ program, you can print a string to the console using the printf () function or the cout stream, that is, for a programmer using these tools, there is no difference under which system the program is made, although the implementation of these functions will be different in different systems, because the API of the systems varies greatly. But if a person writes in assembler, he no longer has ready-made functions like printf (), in which it is thought out for him how to “communicate” with the system, and must do it himself.

As a result, it turns out that writing even a simple program in assembler requires a very large amount of prior knowledge - the "entry threshold" here is much higher than for high-level languages.

Optimal can be considered a program that runs correctly, as fast as possible, and takes up as little memory as possible. In addition, it is easy to read and understand; it is easy to change; its creation requires little time and little expense. Ideally, an assembly language should have a set of characteristics that would make it possible to obtain programs that satisfy as many of the listed qualities as possible.

Programs or their fragments are written in assembly language in cases where they are critical:

the amount of memory used (loader programs, embedded software, programs for microcontrollers and processors with limited resources, viruses, software protection, etc.);
performance (programs written in assembly language run much faster than analog programs written in high-level programming languages. In this case, performance depends on understanding how a particular processor model works, the actual pipeline on the processor, cache size, subtleties of work operating system, resulting in a program that runs faster but loses portability and versatility).

In addition, knowledge of assembly language makes it easier to understand the architecture of a computer and the operation of its hardware, something that knowledge cannot give high-level abstraction languages(YVU). Nowadays, most programmers develop programs in rapid design environments(Rapid Application Development) when all the necessary design and control elements are created using ready-made visual components. This greatly simplifies the programming process. However, one often has to deal with such situations when the most powerful and efficient functioning of individual program modules is possible only if they are written in assembly language (assembler inserts). In particular, in any program associated with the execution of repetitive cyclic procedures, be it cycles of mathematical calculations or the output of graphic images, it is advisable to group the most time-consuming operations into submodules programmed in assembly language. All packages of modern programming languages of a high level of abstraction allow this, and the result is always a significant increase in the speed of programs.

Programming languages of a high level of abstraction were developed with the aim of bringing the method of writing programs as close as possible to the usual for computer users of various forms of writing, in particular mathematical expressions, and also in order not to take into account the specific technical features of individual computers in programs. Assembly language is developed taking into account the specifics of the processor, therefore, in order to competently write a program in assembly language, it is required, in general, to know the architecture of the processor of the computer used. However, bearing in mind the predominant distribution of PC-compatible personal computers and ready-made software packages for them, you can not think about this, since such concerns are taken over by developers of specialized and universal software.

2. About compilers

Which assembler is better?

For the x86-x64 processor, there are over a dozen different assembler compilers. They differ in different feature sets and syntax. Some compilers are more suitable for beginners, some for experienced programmers. Some compilers are fairly well documented, others are not documented at all. For some compilers, many programming examples have been developed. Some assemblers have tutorials and books that cover the syntax in detail, others have none. Which assembler is better?

Given the many dialects of assemblers for x86-x64 and the limited amount of time to study them, we restrict ourselves to a brief overview of the following compilers: MASM, TASM, NASM, FASM, GoASM, Gas, RosAsm, HLA.

What operating system would you like to use?

This is the question you must answer first. The most feature rich assembler won't do you any good if it's not designed to run on the operating system you plan to use.

	Windows	DOS	linux	BSD	QNX	macOS running on Intel/AMD processor
FASM	x	x	x	x
GAS	x	x	x	x	x	x
GoAsm	x
HLA	x		x
MASM	x	x
NASM	x	x	x	x	x	x
RosAsm	x
TASM	x	x

Support 16 bit

If the assembler supports DOS, then it also supports 16-bit instructions. All assemblers provide the ability to write code that uses 16-bit operands. 16-bit support refers to the ability to write code that runs on a 16-bit segmented memory model (compared to the 32-bit flat memory model used by most modern operating systems).

64 bit support

With the exception of TASM, which Borland cooled off in the mid-2000s, and which doesn't even fully support 32-bit programs, all other dialects support 64-bit application development.

Program portability

Obviously, you are not going to write x86-x64 assembler code that would run on some other processor. However, even on a single processor, you may run into portability issues. For example, if you intend to compile and use your assembler programs under different operating systems. NASM and FASM can be used on the operating systems they support.

Do you intend to write an application in assembler and then port this application from one OS to another with a "recompilation" of the source code? This feature is supported by the HLA dialect. Do you expect to be able to build Windows and Linux applications in assembler with minimal effort to do so? Although, if you work with one operating system and absolutely do not plan to work in any other OS, then this problem does not concern you.

Support for high-level language constructs

Some assemblers provide an extended syntax that provides high-level language control structures (such as IF, WHILE, FOR, and so on). Such constructs can make assembly easier to learn and help write more readable code. Some assemblers have built-in "high-level constructs" with limited capabilities. Others provide high-level constructs at the macro level.

No assembler forces you to use any high-level control structures or data types if you prefer to work at the machine instruction encoding level. High-level constructs are an extension of the basic machine language that you can use if you find them convenient.

Documentation quality

The usability of an assembler is directly related to the quality of its documentation. Considering the amount of work that is spent to create an assembler dialect, compiler authors practically do not bother creating documentation for this dialect. Authors, when extending their language, forget to document these extensions.

The following table describes the quality of the assembler reference manual that comes with the product:

	Documentation	Comments
FASM	Good	The author devotes most of his free time to the development of the innovative FASMG. However, the author provides support for FASM from time to time updates manuals, and describes new features on his own forum. The documentation can be considered quite good. Documentation web page.
Gas	bad	is poorly documented and the documentation rather has a "general view". gas is an assembler that was designed to make it easy to write code for different processors. The documentation that exists mainly describes pseudo codes and assembly directives. The "intel_syntax" mode of operation has little to no documentation. Books that use "AT&T" syntax: "Programming from Scratch" by Jonathon Bartlett and "Professional Assembly Language" by Richard Blum, Konstantin Boldyshev.
GoAsm	Weak	Most of the syntax is covered in the manual, and the advanced user will find what they are looking for. Many manuals are posted on the website (http://www.godevtool.com/). A few GoAsm tutorials: Patrick Ruiz's guide
HLA	Expanding	HLA has a 500 page reference manual. The site contains dozens of articles and documentation on HLA.
MASM	Good	Microsoft has written a significant amount of documentation for MASM, there are a large number of reference books written for this dialect.
NASM	Good	The authors of NASM write more software for this dialect, leaving the writing of the manual to “later”. NASM has been around long enough that several authors have written a manual for NASM Jeff Duntemann "Assembly Language Step-by-Step: Programming with Linux", Jonathan Leto "Writing A Useful Program With NASM", there is Stolyarov's book in Russian ( Site of A.V. Stolyarov).
RosAsm	Weak	not very interesting "online tutorials".
TASM	Good	Borland used to produce excellent reference manuals, and reference manuals for TASM were written by enthusiastic authors not affiliated with Borland. But Borland no longer supports TASM, so much of the documentation for TASM is out of print and becoming harder and harder to find.

Textbooks and teaching materials

The documentation in the assembler itself is, of course, very important. Of even greater interest to beginners and others learning assembly language (or advanced features of this assembly language) is the availability of documentation outside of the reference manual for the language. Most people want a textbook explaining how to program in assembly language that doesn't just provide the syntax of machine instructions and expect the reader to be taught how to put these instructions together to solve real problems.

MASM is the leader among the huge volume of books describing how to program in this dialect. There are dozens of books that use MASM as their assembler for teaching assembly.

Most MASM/TASM assembler textbooks continue to teach MS-DOS programming. Although gradually there are textbooks that teach programming in Windows and Linux.

	Comments
FASM	Several tutorials that describe FASM programming: Norseev S.A. "Development of window applications on FASMe" Ruslan Ablyazov "Programming in assembler on x86-64 platform"
Gas	Tutorial using AT&T syntax Tutorial Assembler in Linux for C Programmers
HLA	The Art of Assembly Language Programming 32-bit (available in both electronic and printed form), Windows or Linux programming
MASM	a large number of books on teaching programming under DOS. Not many books about programming under Win32/64 Pirogov, Yurov, Zubkov, Flenov
NASM	many books on programming in DOS, Linux, Windows. Jeff Dunteman's book "Assembly Language Step-by-Step: Programming with Linux" uses NASM for Linux and DOS. Paul Carter's tutorial uses NASM (DOS, Linux).
TASM	Like MASM, a large number of DOS-based books have been written for TASM. But since Borland no longer supports this product, books on the use of TASM have stopped being written. Tom Swan wrote a TASM tutorial that included several chapters on Windows programming.

3. Literature and web resources

beginners

Abel P. Assembly language for IBM PC and programming. - M.: Higher School, 1992. - 447 p.
Bradley D. Programming in assembly language for an IBM personal computer. - M .: Radio and communication, 1988. - 448 p.
Galiseev G.V. IBM PC assembler. Self-instruction manual.: - M .: Williams Publishing House, 2004. - 304 p.: ill.
Dao L. Programming the microprocessor 8088. - M .: Mir, 1988. - 357 p.
Zhukov A.V., Avdyukhin A.A. assembler. - St. Petersburg: BHV-Petersburg, 2003. - 448 p.: ill.
Zubkov S.V., Assembler for DOS, Windows and UNIX. – M.: DMK Press, 2000. – 608 p.: ill. (Series "For programmers").
Irvin K. Assembly language for Intel processors, 4th edition.: Per. from English. – M.: Williams Publishing House, 2005. – 912 p.: ill. - Paral. tit. English (see also the latest 7th edition in the original)
Norton P., Souhe D. Assembly language for IBM PC.– M.: Computer, 1992.– 352 p.
Pilshchikov V.N. Programming in assembly language IBM PC.– M.: DIALOG-MEPhI, 1994–2014 288 p.
Sklyarov I.S. Learning assembler in 7 days www.sklyaroff.ru

Advanced

Kaspersky K. Fundamentals of hacking. The art of disassembly. – M.: SOLON-Press, 2004. 448 p. – (Series “Code digger”)
Kaspersky K. Technique for debugging programs without source texts. - St. Petersburg: BHV-Petersburg, 2005. - 832 p.: ill.
Kaspersky K. Computer viruses inside and out. - St. Petersburg: Peter, 2006. - 527 p.: ill.
Kaspersky K. Notes of a Computer Virus Researcher. - St. Petersburg: Peter, 2006. - 316 p.: ill.
Knut D. The Art of Programming, Volume 3. Sorting and Search, 2nd ed.: Per. from English. - M.: Williams Publishing House, 2003. - 832 p.: ill. - Paral. tit. English
Kolisnichenko D.N. Rootkits for Windows. The theory and practice of programming "invisibility caps" that allow you to hide data, processes, network connections from the system. - St. Petersburg: Science and Technology, 2006. - 320 p.: ill.
Lyamin L.V. Macroassembler MASM.– M.: Radio and communication, 1994.– 320 p.: ill.
Magda Yu. Assembler for Intel Pentium processors. - St. Petersburg: Peter, 2006. - 410 p.: ill.
Maiko G.V. Assembler for IBM PC.– M.: Business-Inform, Sirin, 1997.– 212 p.
Warren G. Algorithmic tricks for programmers, 2nd ed.: per. from English. – M.: Williams Publishing House, 2004. – 512 p.: ill. - Paral. tit. English
Sklyarov I.S. The art of protecting and hacking information. - St. Petersburg: BHV-Petersburg, 2004. - 288 p.: ill.
Weatherell C. Etudes for programmers: Per. from English. - M.: Mir, 1982. - 288 p., ill.
Electronic library of the Frolov brothers www.frolov-lib.ru
Chekatov A.A. The use of Turbo Assembler in the development of programs. - Kyiv: Dialectics, 1995. - 288 p.
Yurov V. Assembler: a special guide. - St. Petersburg: Peter, 2001. - 496 p.: ill.
Yurov V. Assembler. Workshop. 2nd ed. - St. Petersburg: Peter, 2006. - 399 p.: ill.
Yurov V. Assembler. Textbook for high schools. 2nd ed. - St. Petersburg: Peter, 2007. - 637 p.: ill.
Pirogov V. Assembler training course. 2001 Knowledge
Pirogov V. ASSEMBLER training course 2003 Knowledge-BHV
Pirogov V. Assembler for windows
1st edition - M.: publishing house Molgachev S.V., 2002
2nd edition - St. Petersburg:. BHV-Petersburg, 2003 - 684 p.: ill.
3rd edition - St. Petersburg:. BHV-Petersburg, 2005 - 864 p.: ill.
4th edition - St. Petersburg:. BHV-Petersburg, 2012 - 896 p.: ill.
Pirogov V. Assembler with examples. - St. Petersburg:. BHV-Petersburg, 2012 - 416 p.: ill.
Pirogov V. ASSEMBLER and disassembly. - St. Petersburg:. BHV-Petersburg, 2006. - 464 p.: ill.
Pirogov V. work on the book "64-bit assembly language programming (Windows, Unix)". The book covers fasm programming on 64-bit Windows and Unix.
Yurov V., Khoroshenko S. Assembler: training course. - St. Petersburg: Peter, 1999. - 672 p.
Yu-Zheng Liu, Gibson G. Microprocessors of the 8086/8088 family. Architecture, programming and design of microcomputer systems. - M .: Radio and communication, 1987. - 512 p.
Agner Fog: Software optimization resources (assembly/c++) 1996 – 2017.



	Polyakov Andrey Valerievich
	http://info-master.su
	[email protected]
	av-inf.blogspot.ru
In contact with:	vk.com/id185471101
	facebook.com/100008480927503
A page of a book:	http://av-assembler.ru/asm/afd/assembler-for-dummy.htm

ATTENTION!

All rights to this book belong to Polyakov Andrey Valerievich. No part of this book may be reproduced in any form without the permission of the author.

The information contained in this book has been obtained from sources believed by the author to be reliable. However, due to the possibility of human or technical errors, the author cannot guarantee the absolute accuracy and completeness of the information provided and cannot be held liable for any errors or damages arising from the use of this document.

1. PERMISSIONS

Permission is granted to use the book for informational and educational purposes (for personal use only). Free distribution of the book is allowed.

2. LIMITATIONS

It is forbidden to use the book for commercial purposes (sale, placement on resources with paid access, etc.). It is forbidden to make changes to the text of the book. It is forbidden to assign authorship.

see also LICENSE AGREEMENT.

Polyakov A.V.

Assembly language for dummies

Polyakov A.V. Assembler for dummies.


FOREWORD .............................................................. ................................................. ................................................. ...........
INTRODUCTION .................................................. ................................................. ................................................. .................
H A LITTLE ABOUT PROCESSORS.......................................................................................................................................................
1. QUICK START ............................................... ................................................. ................................................. .......
1.1. FIRST PROGRAM ............................................... ................................................. ................................................. .....
1.1.1. Emu8086.............................................. ................................................. ................................................. ...........
1.1.2. Debug ................................................... ................................................. ................................................. .............
1.1.3. MASM, TASM and WASM............................................... ................................................. .........................................
1.1.3.1. Assembly in TASM .......................................................... ................................................. ...............................................
1.1.3.2. Assembly in MASM............................................................... ................................................. ............................................
1.1.3.3. Assembly in WASM.................................................................... ................................................. ............................................
1.1.3.4. Program execution .................................................................. ................................................. ...............................................
1.1.3.5. Using BAT files.................................................................... ................................................. .........................................
1.1.4. Hex editor .................................................................. ................................................. .................
Summary................................................. ................................................. ................................................. ...................
2. INTRODUCTION TO ASSEMBLER .............................................. ................................................. ............................................
2.1. To HOW THE COMPUTER IS SET...............................................................................................................................................
2.1.1. Processor structure .............................................................. ................................................. .................................
2.1.2. Processor registers .................................................................. ................................................. ................................................
2.1.3. Command Execution Loop .................................................................. ................................................. ...............................
2.1.4. Memory organization .............................................................. ................................................. ......................................
2.1.5. Real Mode ................................................................ ................................................. ...............................................
2.1.6. Protected Mode .................................................................. ................................................. ...............................................
2.2. FROM NUMBER SYSTEMS.....................................................................................................................................................
2.2.1. Binary number system .............................................................. ................................................. .........................
2.2.2. Hexadecimal number system .......................................................... ................................................. ....
2.2.3. Other systems ................................................................ ................................................. ...............................................
2.3. P REPRESENTATION OF DATA IN COMPUTER MEMORY.............................................................................................................
2.3.1. Positive numbers .................................................................. ................................................. ...................................
2.3.2. Negative numbers................................................ ................................................. ...................................
2.3.3. What is overflow .................................................................. ................................................. ................................
2.3.4. Flag register .......................................................... ................................................. ...............................................
2.3.5. Character codes ................................................................ ................................................. ..................................................
2.3.6. Real numbers ................................................................ ................................................. ......................................
2.3.6.1. First try................................................ ................................................. ................................................. ..........
2.3.6.2. Normalized number notation .............................................................. ................................................. ...................................
2.3.6.3. Converting a fractional part to binary form............................................................... ................................................. ....
2.3.6.4. Representation of real numbers in computer memory .............................................................. .........................................
2.3.6.5. Fixed Point Numbers............................................................... ................................................. ................................................
2.3.6.6. Floating point numbers .................................................................. ................................................. ...............................................
LICENSE AGREEMENT................................................ ................................................. .................................

FOREWORD

Assembler - this magic word calls out in awe of novice programmers. When communicating with each other, they always talk about the fact that somewhere someone has a familiar “dude” who can read source codes in assembly language like a book text. In this case, as a rule, assembly language is perceived as something inaccessible to mere mortals.

This is partly true. You can learn a few simple commands and even write some kind of program, but you can become a real guru (in any business) only when a person knows the theoretical foundations very well and understands what and why he is doing.

There is another extreme - experienced programmers in high-level languages are convinced that assembly language is a relic of the past. Yes, development tools have come a long way in the last 20 years. Now you can write a simple program without knowing any programming language at all. However, do not forget about such things as, for example, microcontrollers. And in computer programming, some problems are easier and faster to solve using assembly language.

This book is intended for those who already have programming skills in a high-level language, but would like to go “closer to the hardware” and figure out how processor commands are executed, how memory is allocated, how various hardware such as disk drives are managed, etc. .P.

The book is divided into several sections. The first section is a quick start. Here, the basic principles of assembly language programming, the assemblers themselves (compilers) and methods of working with assemblers are very briefly described. If you are comfortable with high-level programming, but would like to learn the basics of low-level programming, then this section alone may be enough for you.

The second section describes such things as calculus systems, representations of data in computer memory, etc., that is, things that are not directly related to programming, but without which professional programming is impossible. Also in the second section, the general principles of assembly language programming are discussed in more detail.

The remaining sections describe some specific examples of assembly language programming, contain reference materials, and so on.

The basics of programming are generally not described in this book, so for beginners I strongly recommend that you read the book. How to become a programmer, where the general principles of programming are explained “on the fingers” and examples of creating simple programs from programs for computers to programs for CNC machines are considered in detail.

INTRODUCTION

First, let's deal with the terminology.

Machine code is a system of instructions for a specific computer (processor), which is interpreted directly by the processor. An instruction is usually an integer that is written to a processor register. The processor reads this number and performs the operation that corresponds to this instruction. It is popularly described in the book How to become a programmer.

Low level programming language(low-level programming language) is a programming language that is as close as possible to programming in machine codes. Unlike machine codes, in a low-level language, each command does not correspond to a number, but to an abbreviated name of the command (mnemonic). For example, the ADD command is short for ADDITION. Therefore, the use of a low-level language greatly simplifies writing and reading programs (compared to programming in machine codes). The low level language is processor specific. For example, if you wrote a program in a low level language for a PIC processor, you can be sure that it will not work with an AVR processor.

High level programming language is a programming language that is as close as possible to the human language (usually English, but there are programming languages in national languages, for example, the 1C language is based on Russian). The high-level language is practically not tied to a particular processor or operating system (unless specific directives are used).

Assembly language is a low-level programming language in which you write your programs. Each processor has its own assembly language.

An assembler is a special program that converts (assembles, that is, collects) the source code of your program written in assembly language into an executable file (file with the EXE or COM extension). To be precise, additional programs are required to create an executable file, and not just assembler. But more on that later...

In most cases, they say "assembler", but they mean "assembly language". Now you know that these are different things and it is not entirely correct to say so. Although all programmers will understand you.

Unlike high-level languages such as Pascal, BASIC, etc., EVERY ASSEMBLER has its own ASSEMBLY LANGUAGE. This rule fundamentally distinguishes assembly language from high-level languages. The source code of a program (or simply "source code") written in a high-level language can in most cases be compiled with different compilers for different processors and different operating systems. With assembler sources, this will be much more difficult to do. Of course, this difference is almost imperceptible for different assemblers that are designed for the same processors. But the fact of the matter is that for EACH PROCESSOR there is OWN ASSEMBLER and OWN ASSEMBLY LANGUAGE. In this sense, programming in high-level languages is much easier. However, you have to pay for all the pleasures. In the case of high-level languages, we may encounter such things as a larger executable file size, worse performance, and so on.

In this book, we will only talk about programming for computers with Intel (or compatible) processors. In order to test the above

in book examples, you will need the following programs (or at least some of them):

1. Emu8086. Good program, especially for beginners. Includes a source code editor and some other useful stuff. Runs on Windows, although programs are written under DOS. Unfortunately, the program costs money (but it's worth it))). See http://www.emu8086.com for details.

2. TASM - Turbo Assembler from Borland. You can create programs for both DOS and Windows. It also costs money and is no longer supported at the moment (and Borland no longer exists). In general, a good thing.

3. MASM - Assembler from Microsoft (stands for MACRO assembler, not Microsoft Assembler, as many uninitiated people think). Perhaps the most popular assembler for Intel processors. It is still supported. Freeware program. That is, if you buy it separately, it will cost money. But it's available for free to MSDN subscribers and is part of Microsoft's Visual Studio software package.

4. WASM - assembler from Watcom. Like all others, it has advantages and disadvantages.

5. Debug - has modest capabilities, but has a big plus - it is included in the standard set of Windows. Look for it in the WINDOWS\COMMAND or WINDOWS\SYSTEM32 folder. If you do not find it, then in other folders of the WINDOWS directory.

6. It is also desirable to have some hex editor. Dosovsky file manager, such as Volkov Commander (VC) or Norton Commander (NC) will not hurt either. With their help, you can also view the hexadecimal codes of the file, but you cannot edit it. There are quite a few free hex editors on the Internet. Here's one: McAfee FileInsight v2.1 . The same editor can be used to work with the source code of programs. However, I prefer to do it with the following editor:

7. Text editor. Necessary for writing the source code of your programs. I can recommend the free PSPad editor, which supports many programming languages, including Assembly language.

All programs (and sample programs) presented in this book have been tested to work. And it is these programs that are used to implement the sample programs in this book.

And one more thing - the source code written, for example, for Emu8086, will be slightly different from the code written, for example, for TASM. These differences will be specified.

Most of the programs in this book are written for MASM. First, because this assembler is the most popular and still supported. Second, because it comes with MSDN and Microsoft's Visual Studio software package. And thirdly, because I am the proud owner of a licensed copy of MASM.

If you already have any assembler that is not included in the above list, then you will have to deal with its syntax yourself and read the user manual to learn how to work with it correctly. But the general recommendations given in this book will be valid for any (well, almost any) assemblers.

A little about processors

The processor is the brain of the computer. Physically, this is a special microcircuit with several hundred pins, which is inserted into the motherboard. If you can hardly imagine what it is, I recommend that you read the article Dummies about computers.

There are quite a lot of processors even in the world of computers. But in addition to computers, there are also televisions, washing machines, air conditioners, internal combustion engine control systems, etc., where processors (microprocessors, microcontrollers) are also very widely used.

Each processor has its own set of registers. Processor registers are special memory cells that are located directly on the processor chip. Registers are used for different purposes (registers will be discussed in more detail below).

Each processor has its own instruction set. The processor instruction is written to a certain register, and then the processor executes this instruction. We will talk about processor instructions and registers a lot and often throughout the book. I recommend this book for beginners How to become a programmer, where in the most general terms, but in an understandable language, it is told about the principles of executing a program by a computer.

What is an instruction in terms of the processor? It's just a number. However, modern processors can have several hundred instructions. Remembering all of them will be difficult. How then to write programs? To simplify the work of the programmer, the assembler language was invented, where each command corresponds to a mnemonic code. For example, the number 4 corresponds to the mnemonic ADD . Sometimes assembly language is also called the language of mnemonic instructions.

1. QUICK START

1.1. First program

The usual first example is a program that prints the string "Hello World!" to the screen. However, for a person who has just started learning Assembler, such a program will be too complicated (you will laugh, but this is true - especially in the absence of intelligible information). Therefore, our first program will be even simpler - we will display only one character - the English letter "A". And in general - if you have already decided to become a programmer - urgently set the default English keyboard layout. Moreover, some assemblers and compilers do not accept Russian letters. So, our first program will display the English letter "A". Next, we will consider the creation of such a program using various assemblers.

If you have downloaded and installed the 8086 emulator (see the INTRODUCTION section), you can use it to create your first assembly language programs. Currently (November 2011) version 4.08 is available. Help in Russian can be found here: http://www.avprog.narod.ru/progs/emu8086/help.html.

The Emu8086 program is paid. However, within 30 days you can use it for review for free.

So, you have downloaded and installed the Emu8086 program on your computer. Run it and create a new file through the menu FILE - NEW - COM TEMPLATE (File - New - COM File Template). In the source code editor after that we will see the following:

Rice. 1.1. Creating a new file in Emu8086.

It should be noted here that programs created using Assemblers for computers running Windows are of two types: COM and EXE. We will look at the differences between these files later, but for now it is enough for you to know that for the first time we will create executable files with the COM extension, since they are simpler.

After creating the file in Emu8086 as described above, in the source code editor you will see the line "add your code hear" - "add your code here" (Fig. 1.1). We delete this line and insert the following text instead:

Thus, the full text of the program will look like this:

In addition, there are still comments in the upper part (in Fig. 1.1 - this is the text in green). A comment in assembly language starts with a ; (semicolon) and continues to the end of the line. If you do not know what comments are and why they are needed, see the book How to become a programmer. As I said, here we will not explain the basics of programming, since the book you are reading now is designed for people who are familiar with the basics of programming.

Also note that the case of characters in assembly language does not play a role. You can write RET , ret or Ret - it will be the same command.

You can save this file somewhere on disk. But you can't save. To execute the program, press the EMULATE button (with a green triangle) or the F5 key. Two windows will open: the emulator window and the source code window (Fig. 1.2).

AT The emulator window displays registers and contains program control buttons. The source code window displays the source code of your program, highlighting the line that is currently executing. All this is very convenient for studying and debugging programs. But we don't need it yet.

AT In the emulator window, you can run your program for execution as a whole (the RUN button) or in step-by-step mode (the SINGLE STEP button). Stepping mode is convenient for debugging. Well, now we will launch the program for execution with the RUN button. After that (if you didn't make any mistakes in the program text) you will see a message about program termination (Fig. 1.3). Here you are informed that the program has transferred control to the operating system, that is, the program has been successfully completed. Press the OK button in this window and you will finally see the result of running your first assembly language program (Fig.

Rice. 1.2. Emu8086 emulator window.

Rice. 1.3. Program end message.

Rice. 1.4. Your first program is done.