What language is artificial intelligence written in? Introduction to AI. Little-known languages for creating artificial intelligence

14.08.2019 news

As long as programmers can make money by programming, then the existing AIs are not AI, no matter what candy wrapper is hung on them. The solution I propose can solve this issue.

As a result of my research, I stopped using the phrase "artificial intelligence" for myself as too vague and came to a different formulation: an algorithm for self-learning, research and application of the results found to solve any possible tasks for implementation.

What is AI, a lot has already been written about it. I put the question in a different way, not “what is AI”, but “why is AI needed”. I need it to earn a lot of money, then for the computer to do everything for me that I myself do not want to do, after building a spaceship and fly to the stars.

So I will describe here how to make the computer fulfill our desires. If you expect here to see a description or mention of how consciousness works, what self-consciousness is, what it means to think or reason - then this is not here. Thinking is not about computers. Computers calculate, calculate and execute programs. So let's think about how to make a program that can calculate the necessary sequence of actions to realize our desires.

In what form our task gets into the computer - through the keyboard, through the microphone, or from sensors implanted in the brain - this is not important, this is a secondary matter. If we can make a computer fulfill desires written in text, then after that we can set a task for it to make a program that also fulfills desires, but through a microphone. Image analysis is also redundant.

To say that in order for an AI to be created to be able to recognize images and sound, such algorithms must initially be included in it, it’s like saying that every person who created them knew from birth how such programs work.

Let's formulate the axioms:
1. Everything in the world can be calculated according to some rules. (more on errors later)
2. Calculation according to the rule, this is an unambiguous dependence of the result on the initial data.
3. Any unambiguous dependencies can be found statistically.
And now the statements:
4. There is a function of converting text descriptions into rules - so that you don't have to search for knowledge that has long been found.
5. There is a function of converting tasks into solutions (this is the fulfillment of our desires).
6. The arbitrary data prediction rule includes all other rules and functions.

Let's translate this into a programmer's language:
1. Everything in the world can be calculated using some algorithms.
2. The algorithm always gives the same result when repeating the initial data.
3. If there are many examples of initial data and results to them, with an infinite search time, you can find the whole set of possible algorithms that implement this dependence of the initial data and the result.
4. There are algorithms for converting text descriptions into algorithms (or any other informational data) - so as not to look for the required algorithms statistically, if someone has already found and described them.
5. It is possible to create a program that will fulfill our desires, whether they are in text or voice form, provided that these desires are realized physically and within the required time frame.
6. If you manage to create a program that can predict and learn to predict as new data arrives, then after an infinite time such a program will include all the algorithms possible in our world. Well, with a non-infinite time for practical use and with some error, it can be forced to execute the algorithms of the program p. 5 or any others.

Also, IMHO:
7. There is no other way of learning completely independent and independent of a person, except for searching by enumeration of rules and statistically checking them for forecasting. And you just need to learn how to use this property. This property is part of how the brain works.

What needs to be predicted. From birth, a stream of information begins to flow into the human brain - from the eyes, ears, tactile, etc. And all decisions are made by it on the basis of previously received data. By analogy, we are making a program that has an input of new information for one byte - an input byte stream. Everything that was received earlier is presented in the form of one continuous list. From 0 to 255, external information will come in, and above 255 we will use it as special control markers. Those. input allows you to write down, say, to 0xFFFF the dimension of a number. And it is this flow, or rather the next added amount of information, that you need to learn how to predict, based on the data received before. Those. the program should try to guess what the next number will be added.

Of course, other options for presenting data are also possible, but for purposes when a variety of formats are received as input, we simply stuff various html with descriptions there first, this one is the most optimal. Although markers can be replaced with sequence escapes for optimization purposes, it is less convenient to explain with them. (Also, imagine that everything is in ASCII, not UTF).

So, first, as at birth, we shove all the Internet pages with descriptions there and separate them with a new text marker - - so that this black box learns everything in a row. Markers I will designate with tags, but it is understood that they are just some kind of unique number. After a certain amount of data has passed, we begin to manipulate the incoming information using control markers.

By forecasting, I mean such an algorithm that knows not only what patterns have already been, but is also constantly looking for new ones. And therefore, if the sequence is sent to the input of such a program
sky blue
grass green
ceiling …
, then he must figure out what is behind the marker follows the color from the previously specified object, and in place of the ellipsis will predict the most probable color of the ceiling.

We repeated several examples to him so that he would understand which function needs to be applied within these tags. And the color itself, of course, he should not invent it, but should already know it on his own, having studied calculating patterns on forecasting.

When an answer is required from the algorithm, then what was the prediction of the previous step is fed to the input of subsequent steps. Auto-prediction type (by analogy with the word autocorrelation). And at the same time, we turn off the search for new sequences.

Another example, you can indicate the question after the first marker, and the answer in the second, and then if this algorithm is super-mega-cool, it should start giving answers even to the most difficult questions. Again, within the limits of the facts already studied.

You can come up with a lot of different tricks with control markers fed to the input of the predictive mechanism, and get any desired functions. If you get bored reading about the algorithmic rationale for this property, you can scroll down to the following examples with control markers.

What is this black box made of? First of all, it is worth mentioning that it is not possible to make one hundred percent forecasting always and in all situations. On the other hand, if the result is always zero, then this will also be a prediction. Albeit with a 100% margin of error. Now let's calculate with what probability, what number, what number follows next. For each number, the most probable next is determined. Those. we can predict it a little. This is the first step of a very long journey.

An unambiguous mapping of the input data to the result according to the algorithm, this corresponds to the mathematical definition of the word function, except that certainty in the number and placement of input and output data is not imposed on the definition of the algorithm. As an example, let there be a small plate: an object-color, we will enter many lines into it: the sky is blue, the grass is green, the ceiling is white. This turned out to be a small local unambiguous mapping function. And it doesn't matter that in reality the colors are often not the same - there will be other tables of their own. And any database that contains stored properties of something is a set of functions, and maps object IDs to their properties.

To simplify, further in many situations, instead of the term algorithm, I will use the term function, such as one-parameter, unless otherwise specified. And any such mentions, you need to mean extensibility to algorithms in your head.

And I will give an exemplary description, because in reality, I have yet to realize all this ... But it is all logical. And it should also be borne in mind that all calculations are carried out by coefficients, and not true or false. (possible even if it is explicitly stated that true and false).

Any algorithm, especially one that operates on integers, can be decomposed into a set of conditions and transitions between them. The operations of addition, multiplication, etc. are also decomposed into subalgorithms from conditions and transitions. And also the result operator. It's not a return statement. The condition operator takes a value from somewhere and compares it to a constant value. And the result operator adds a constant value somewhere. The location of the take or fold is calculated relative to either the base point or relative to the previous steps of the algorithm.

Struct t_node ( int type; // 0 - condition, 1 - result union ( struct ( // condition statement t_node* source_get; t_value* compare_value; t_node* next_if_then; t_node* next_if_else; ); struct ( // result statement t_node* dest_set ; t_value* result_value; ); ) );
Offhand, something like this. And the algorithm is built from such elements. As a result of all the reasoning, a more complex structure will turn out, and this one is for the initial presentation.

Each predicted point is calculated by some function. A condition is attached to the function, which tests for the applicability of this function to this point. The general chain returns either false - not applicable, or the result of the function calculation. And continuous flow forecasting is a one-by-one check of the applicability of all the functions already invented and their calculation, if true. And so for each point.

In addition to the condition for applicability, there are also distances. Between the initial data and the result, and this distance is different, with the same function applied depending on the condition. (And from the condition to the initial or predicted one there is the same distance, we will imply it, but omit it in explanations. And the distances are dynamic).

With the accumulation of a large number of functions, the number of conditions testing the applicability of these functions will increase. But, in many cases, these conditions can be arranged in the form of trees, and the pruning of the sets of functions will occur in proportion to the logarithmic dependence.

When the function is initially created and measured, instead of the result statement, the distribution of actual results is accumulated. After the accumulation of statistics, the distribution is replaced by the most probable result, and the function is preceded by a condition, having also tested the condition for the maximum probability of the result.

It goes looking for single correlation facts. Having accumulated a bunch of such single ones, we are trying to combine them into groups. We look, from which it is possible to single out the general condition and the total distance from the initial value to the result. And also, we check that under such conditions and distances, in other cases, where the original value is repeated, the result is not widely distributed. Those. in certain frequent usages, it is highly identical.

Identity coefficient. (This is a bidirectional identity. But more often it is unidirectional. I'll rethink the formula later.)
The number of each XY pair is squared and summarized.
Divide by: the sum of the quantities squared by each X value plus the sum of the quantities squared by Y minus the dividend.
Those. SUM(XY^2) / (SUM(X^2) + SUM(Y^2) - SUM(XY^2)).
This ratio is from 0 to 1.

And as a result, what happens. We were convinced by high-frequency facts that under these conditions and distance, these facts are unambiguous. And the rest of the rare - but in total there will be much more of them than frequent - have the same error as the frequently encountered facts in these conditions. Those. we can accumulate a forecasting base on single occurrences of facts under these conditions.

Let there be a knowledge base. The sky is often blue, and tropical-rare-garbage somewhere saw that it was gray-brown-crimson. And remember, because we checked the rule - it is reliable. And the principle does not depend on the language, whether it be Chinese or alien. And later, after understanding the translation rules, it will be possible to figure out that one function can be assembled from different languages. At the same time, it should be taken into account that the knowledge base can also be represented in the form of algorithms - if the initial value is such and such, then the resulting value is such and such.

Further, as a result of enumeration of other rules, we find that under other arrangements and conditions, the already seen identity arises. And now we do not have to collect a large base to confirm the identity, it is enough to collect a dozen single facts, and see that within this dozen, the mapping occurs to the same values as in the previous function. Those. the same function is used in other conditions. This property forms the fact that we can describe the same property in the description by different expressions. And sometimes they are simply listed in tables on Internet pages. And further, the collection of facts on this function can be done already for several use cases.

There is an accumulation of possible various conditions and arrangements relative to functions, and one can also try to find patterns on them. Not infrequently, the selection rules are similar for different functions, differing only in some feature (for example, a word identifying a property or a heading in a table).

In general, we found a bunch of one-parameter functions. And now, as in the case of formation from single facts into one-parameter facts, so here too, we will try to group the one-parameter ones according to part of the condition and part of the distance. The part that is common is the new condition, and the one that differs is the second parameter of the new function - two-parameter, where the first parameter will be the one-parameter parameter.

It turns out that each new parameter in multi-parameter ones is with the same linearity as the formation from single facts into single-parameter ones (well, or almost with the same). Those. finding the N-parameter is proportional to N. Which, in striving for a lot of parameters, becomes almost a neural grid. (Who wants to, he will understand.)

conversion functions.

Of course, it's great when we were given a lot of corresponding examples, say, small texts of translation from Russian into English. And you can start trying to find patterns between them. But in reality, it's all mixed up in the input stream of information.

Here we took found one some function, and the path between the data. Second and third. Now let's see if we can find a common part of the paths among them. Try to find structures X-P1-(P2)-P3-Y. And then, find more similar structures, with similar X-P1 and P3-Y, but different P2. And then we can conclude that we are dealing with a complex structure, between which there are dependencies. And the set of rules found, minus the middle part, will be combined into groups and called the conversion function. Thus, translation, compilation, and other complex entities are formed.

Here, take a sheet with a Russian text, and with its translation into an unfamiliar language. Without a tutorial, it is extremely difficult to find an understanding of the translation rules from these sheets. But it is possible. And in much the same way as you would do it, it needs to be processed into a search algorithm.

When I figure out simple functions, then I will continue to think about the conversion search, until both the sketch and the understanding that this is also possible will do.

In addition to the statistical search for functions, it is also possible to form them from descriptions, by means of a conversion function into rules - a reading function. Statistics for the initial creation of a reading function can be found in abundance on the Internet in textbooks - correlations between descriptions and the rules applied to the examples in those descriptions. Those. it turns out that the search algorithm should equally see both the initial data and the rules applied to them, i.e. everything should be located in a data graph that is homogeneous in terms of access types. From the same principle, only in reverse, rules can be found for converting internal rules back into external descriptions or external programs. And also to form an understanding of the system, what it knows and what it does not - before requesting an answer, you can ask if the system knows the answer - yes or no.

The functions I was talking about are actually not just a single piece of an algorithm that can be found, but can consist of a sequence of other functions. Which, in turn, is not a procedure call, but a sequence of transformations, such as working with pipe in linux. For example, I roughly described the prediction of words and phrases at once. But in order to get a prediction of only a symbol, the function of taking this one symbol must be applied to this phrase. Or the function has learned to understand tasks in English, and TK in Russian. Then RussianTK->TranslateIntoEnglish->ExecuteTSinEnglish->Result.

Functions may not be fixed in the definition, and be redefined or redefined as additional information becomes available or when conditions change at all - the translation function is not finite, and besides, it may change over time.

The repeatability of one set in different functions also affects the assessment of probabilities - it forms or confirms types.

It should also be mentioned that quite a few sets of the real world, and not Internet pages, are ordered and possibly continuous, or with other characteristics of the sets, which somehow improves the probability calculations.

In addition to directly measuring the found rule on examples, I assume the existence of other methods of evaluation, something like a rule classifier. And perhaps the classifier of these classifiers.

More nuances. Forecasting consists of two levels. The level of found rules and the level of search for new rules. But the search for new rules is essentially the same program with its own criteria. And I admit (although I haven’t thought it through yet) that everything can be simpler. What is needed is a zero level, which will search for possible search algorithms in all their diversity, which in turn will create final rules. Or maybe it's generally a multi-level recursion or a fractal.

Let's get back to control markers. As a result of all these arguments about the algorithm, it turns out that through them we ask this black box to continue the sequence, and issue a calculation for a function determined by similarity. Type to do as it was shown before.

There is another way to define a function in this mechanism - to issue a function through a definition. For example:
Translate into English table table
Answer the question sky color blue
Create a TK program I want artificial intelligence ...

The use of this system to solve our problems consists in the following algorithm. We make a description of the definition of a special identifier for describing tasks. Then, we create a description of the task and assign it a new identifier. We make a description of the permissible actions. For example (although not practical), the processor commands directly are descriptions from the Internet, and manipulators are connected to the computer, which can be controlled through ports. And after that, we can ask the system what the next action should be performed to bring the task closer to the solution, referring to the task by identifier. And also ask every other time if you need any additional information necessary for further calculation of actions - information on general knowledge or on the current state of solving the problem. And we loop the requests for actions and requests for information in some external loop. This whole scheme is based on textual definitions, and therefore can be run through functions obtained by definition. And the output - only commands - there is no question of multiprobability of texts. The issue of the scale of the required forecasting is not being discussed now - if there is a necessary and sufficient forecasting functionality, logically it should work.

If someone sees in AI not a way to solve problems, but some characteristics of a person, then we can say that human behavior and qualities are also calculated and predictable. And in the literature there are enough descriptions of this or that property. And therefore, if we describe in the system which of the properties we want, then it will emulate it to the best of knowledge. And it will reproduce either abstract average behavior, or with reference to a specific person. Well, or if you want, you can try to launch the superintelligence - if you give it a definition.

You can predict something that happens after some time. Objects move with velocities and accelerations, and all sorts of other possible changes in anything over time. Space can also be predicted. For example, you enter an unfamiliar room, in which there is a table, in which one of the corners is covered with a sheet of paper. You can't see this corner, but you can mentally predict that it is most likely as rectangular as the other corners (rather than rounded), and the color of this corner is the same as the other corners. Of course, the prediction of space occurs with errors - suddenly that corner of the table is gnawed, and there is a stain of paint on it. But the forecasting of temporal processes is also always with errors. The acceleration of free fall on the ground is not always 9.81, but depends on the height above sea level, and on the nearby mountains. And measuring instruments you can never make absolutely accurate. Those. forecasting of space and processes in time always occurs with errors, and different predicted entities have different errors. But the essence is the same - algorithms found statistically.

It turns out that predicting our byte stream is like predicting the information space. It encodes both space and time. Some kind of structure is found there - let it be a piece of the program. This piece of software is a predictable space, just like a table. The set of prediction rules of this structure form the rules of this structure - something like regular expressions. To determine the structure of these structures, the prediction is calculated not of a single value, but of a set of valid values. At the time of the description of the algorithm, I was not yet aware of the separate role of structures in it, and therefore it did not get there. But by adding this property, a complete understanding of the picture is formed, and over time I will try to rewrite it. Keep in mind that structures are conditionally extensible - if such and such a property has such and such a value, then another bundle of properties is added.

In general, everything that is possible in our world is described by types, structures, conversions and processes. And all these properties obey the rules that are the result of prediction. The brain does the same thing, only not with exact methods, because. it is an analog device.

Will he seek research purposefully without setting such a task? No, because he does not have his own desires, but only the tasks set. What we have is responsible for the realization of our own desires and interests, this is what we call personality. You can also program your personality on a computer. And whether it will be similar to a human, or some kind of computer analogue - but it will still remain just a task.

And our creative activity in art is the same research, only essences that affect our emotions, feelings and mind are searched for.

There are no definitive instructions for the manufacture of such a program yet. There are many questions left, both about the algorithm itself, and about its use (and about the multivariance of texts). Over time, I will further refine and detail the description.

An alternative direction for the implementation of forecasting is the use of recurrent neural networks (say, the Elman network). In this direction, you do not need to think about the nature of forecasting, but there are many difficulties and nuances. But if this direction is implemented, then the rest of the use remains the same.

Conclusions on the article:
1. Prediction is a way to find all possible algorithms.
2. By manipulating the prediction input, these algorithms can be pulled out from there.
3. This property can be used to talk to the computer.
4. This property can be used to solve any problem.
5. AI will be how you define it, and once defined it can be solved as a problem.

Some will say that it will take too long to find any pattern by brute force. In contrast to this, I can say that a child learns to speak for several years. How many options can we calculate in a few years? Found and ready-made rules are applied quickly, and much faster for computers than for humans. But the search for new ones is both here and there for a long time, but whether a computer will last longer than a person, we will not know this until we make such an algorithm. Also, I note that brute force parallelizes perfectly, and there are millions of enthusiasts who turn on their home PCs for this purpose. And it turns out that these few years can still be divided into a million. And the rules found by other computers will be studied instantly, in contrast to a similar process in humans.

Others will argue that there are billions of cells in the brain dedicated to parallelization. Then the question is, how are these billions used when trying to learn a foreign language without a textbook using examples? The person will sit over printouts for a long time and write out correlating words. At the same time, one computer will do this in batches in a split second.

And image analysis - move a dozen billiard balls and count how many collisions there will be. (hiding from the sound). And two dozen or three ... And what does this have to do with billions of cells?

In general, the speed of the brain and its multi-parallelism is a very controversial issue.

When you think about creating a thinking computer, you copy into it what a person has learned throughout life, and do not try to understand what are the mechanisms that allow it to be accumulated from the starting program - to eat and sleep. And these mechanisms are by no means based on the axioms of formal logic. But in mathematics and statistics.

PPS: My opinion is that there is no scientific definition of the term "Artificial Intelligence". There is only sci-fi. And if you need reality, then see paragraph 5 in the conclusions of the article.

PPPS: I understood many different interpretations much later after writing the article. Let's say that the search for a question-answer relationship is an approximation. Or what are more precise scientific definitions of pulling out the desired function from the variety of prediction functions found in the process of searching. It is impossible to write a separate article for each small moment of understanding, but it is impossible for everything in general, because it cannot be combined into one heading. And all these understandings give an answer on how to get answers from the computer computing power to the questions asked, the answers to which can not always be read in the existing descriptions, as we say for the Watson project. How to create a program that, at one mention, or a movement of a finger, tries to understand and do what they want from it.

Someday such a program will be made. And call it another gadget. Not AI.

****
Sources on this topic, as well as further development of the view can be found on the site

The process of creating artificial intelligence, at first glance, seems to be quite a difficult task. Watching these beautiful examples of AI, you can understand that it is possible to create interesting programs with AI. Depending on the goal, different levels of knowledge are needed. Some projects require deep knowledge of AI, other projects require only knowledge of a programming language, but the main question facing the programmer. Which language to choose for programming artificial intelligence? Here is a list of AI languages that might be helpful.

LISP

The first computer language used to create artificial intelligence is LISP. This language is quite flexible and extensible. Features like rapid prototyping and macros are very useful in building AI. LISP is a language that turns complex tasks into simple ones. The powerful object-oriented system makes LISP one of the most popular programming languages for artificial intelligence.

Java

The main advantages of this feature rich language are: transparency, portability and maintainability. Another advantage of the Java language is its versatility. If you are a beginner, then you will be glad to know that there are hundreds of video tutorials on the Internet that will make your learning easier and more efficient.

The main features of java are: easy debugging, good user experience, easy to work with large projects. Projects created using the Java language have an attractive and simple interface.

Prolog

This interactive symbolic programming language is popular for projects that require logic. With a powerful and flexible foundation, it is widely used for non-numerical programming, theorem proving, natural language processing, expert systems, and artificial intelligence in general.

Prolog is a declarative language with formal logic. AI developers value it for its high level of abstraction, built-in search engine, non-determinism, and so on.

Python

Python is widely used by programmers due to its clean grammar and syntax, nice design. Various data structures, a bunch of testing frameworks, high-level and low-level programming make Python one of the most popular programming languages for artificial intelligence.

History of AI development

In order to see the connection between AI and a programming language, let's look at the most important events in the history of AI. It all started in 1939 when the Electro robot was introduced at the World's Fair. The next robot was built in 1951 by Edmund Berkeley.

Robot Robbie was built in 1956. Unfortunately, there is no information about how it was developed. In 1958, the LISP programming language was invented. Although this language was developed 60 years ago, it is still the main language for many artificial intelligence programs.

In 1961, UNIMATE was built. This is the first industrial robot to be mass-produced. This robot was used by General Motors to work on the production line. To make UNIMATE, the scientists used Val, an assembler variable. This language consists of simple phrases, monitor commands, and self-explanatory instructions.

The artificial intelligence system Dendral was built in 1965. It helped to easily determine the molecular structure of organic compounds. This system was written in Lisp.

In 1966, Weizenbaum created Eliza, the first virtual companion. One of the most famous models was called The Doctor, he answered questions in the style of a psychotherapist. This bot was implemented when comparing vehicle samples. The first version of Eliza was written in SLIP, the language processing list was developed by Weizenbaum. Later, one of its versions was rewritten in Lisp.

The first mobile robot programmed in Lisp was Sheki. With the help of a problem-solving program of pads and sensors, Neck moved, turned lights on and off, climbed up and down, opened doors, closed doors, pushed objects, and moved things. Sheki moved at a speed of 5 km per hour.

In the next 15 years, the world saw many amazing inventions: Denning's sentinel robot, LMI Lambda, Omnibot 2000, MQ-1 Predator drone, Furby, AIBO robot dog, and Honda ASIMO.

In 2003, iRobot invented the Roomba robot vacuum cleaner. Developed in Lisp, this stand-alone vacuum cleans floors using specific algorithms. It detects obstacles and bypasses them.

What programming language do you use to develop AI programs? Write about your work in the comments or in our VKontakte group.

How does it happen that artificial intelligence is successfully developing, but there is still no “correct” definition for it? Why did the hopes pinned on neurocomputers not come true, and what are the three main tasks facing the creator of artificial intelligence?

You will find the answer to these and other questions in the article under the cut, written on the basis of a speech by Konstantin Anisimovich, Director of the Technology Development Department at ABBYY, one of the country's leading experts in the field of artificial intelligence.
With his personal participation, document recognition technologies were created, which are used in ABBYY FineReader and ABBYY FormReader products. Konstantin spoke about the history and basics of AI development at one of the master classes for students of Technopark Mail.Ru. The material of the master class became the basis for a series of articles.

There will be three posts in total:
Artificial intelligence for programmers

Knowledge acquisition: knowledge engineering and machine learning

Ups and downs of approaches in AI

Since the 1950s, two approaches have emerged in the field of artificial intelligence - symbolic computing and connectionism. Symbolic computing is a direction based on modeling human thinking, and connectionism is based on modeling the structure of the brain.

The first advances in the field of symbolic computing were the Lisp language created in the 1950s and the work of J. Robinson in the field of inference. In connectionism, this was the creation of a perceptron, a self-learning linear classifier that simulates the operation of a neuron. Further outstanding achievements were mainly in line with the symbolic paradigm. In particular, these are the works of Seymour Pipert and Robert Anton Winson in the field of the psychology of perception and, of course, the frames of Marvin Minsky.

In the 70s, the first applied systems appeared that use elements of artificial intelligence - expert systems. Then there was a certain renaissance of connectionism with the advent of multilayer neural networks and the algorithm for training them using the backpropagation method. In the 80s, the fascination with neural networks was simply rampant. Proponents of this approach promised to create neurocomputers that would work almost like a human brain.

But nothing special came of this, because real neurons are much more complicated than the formal ones on which multilayer neural networks are based. And the number of neurons in the human brain is also much larger than one could afford in a neural network. The main thing for which multilayer neural networks turned out to be suitable is the solution of the classification problem.

The next popular paradigm in the field of artificial intelligence was machine learning. The approach began to develop rapidly since the late 80s and has not lost popularity to this day. A significant impetus to the development of machine learning was given by the advent of the Internet and a large amount of various easily accessible data that can be used to train algorithms.

The main tasks in the design of artificial intelligence

It is possible to analyze what unites those tasks that relate to artificial intelligence. It is easy to see that they have in common - the absence of a well-known, well-defined solution procedure. In this, in fact, the problems related to AI differ from the problems of compilation theory or computational mathematics. Intelligent systems are looking for suboptimal solutions to the problem. It is impossible to prove or guarantee that the solution found by artificial intelligence will be strictly optimal. Nevertheless, in most practical problems, suboptimal solutions suit everyone. Moreover, it must be remembered that a person almost never solves a problem optimally. Rather, on the contrary.

A very important question arises: how can AI solve a problem for which there is no solution algorithm? The point is to do it in the same way as a person - to put forward and test plausible hypotheses. Naturally, knowledge is needed to put forward and test hypotheses.

Knowledge is a description of the subject area in which an intelligent system operates. If we have a natural language character recognition system, then knowledge includes descriptions of the device of symbols, the structure of the text, and certain properties of the language. If it is a customer credit scoring system, it must have knowledge of the types of customers and knowledge of how the customer's profile relates to their potential insolvency. Knowledge is of two types - about the subject area and about finding solutions (meta-knowledge).

The main tasks of designing an intelligent system are reduced to the choice of methods for representing knowledge, methods for obtaining knowledge and methods for applying knowledge.

Knowledge Representation

There are two main ways of representing knowledge - declarative and procedural. Declarative Knowledge may be presented in structured or unstructured form. Structured views are one or another kind of frame approach. Semantic networks or formal grammars, which can also be considered varieties of frames. Knowledge in these formalisms is presented as a set of objects and relations between them.

Unstructured representations are usually used in those areas that are associated with solving classification problems. These are usually vectors of weight estimates, probabilities, and the like.

Almost all methods of structured knowledge representation are based on the formalism of frames, which Marvin Minsky of MIT introduced in the 1970s to designate the knowledge structure for the perception of spatial scenes. As it turned out, this approach is suitable for almost any task.

A frame consists of a name and individual units called slots. The value of a slot can, in turn, be a reference to another frame... A frame can be a child of another frame, inheriting its slot values. In this case, the descendant can redefine the values of the ancestor's slots and add new ones. Inheritance is used to make the description more compact and avoid duplication.

It is easy to see that there is a similarity between frames and object-oriented programming, where an object corresponds to a frame, and a field corresponds to a slot. This similarity is not accidental, because frames were one of the origins of OOP. In particular, one of the first object-oriented languages, Small Talk, implemented frame representations of objects and classes almost exactly.

For procedural view knowledge, products or production rules are used. A production model is a rule-based model that allows knowledge to be represented in the form of “condition-action” sentences. This approach used to be popular in various diagnostic systems. It is quite natural to describe symptoms, problems or malfunctions as a condition, and as an action to describe a possible malfunction that leads to the presence of these symptoms.

In the next article, we will talk about ways to apply knowledge.

Bibliography.

John Alan Robinson. A Machine-Oriented Logic Based on the Resolution Principle. Communications of the ACM, 5:23-41, 1965.
Seymour Papert, Marvin Minsky. Perceptrons. MIT Press, 1969
Russell, Norvig. Artificial Intelligence: A Modern Approach.
Simon Haykin. Neural networks: a comprehensive foundation.
Nils J. Nilsson. Artificial Intelligence: A New Synthesis.

Translation

Language-understanding machines would be very useful. But we don't know how to build them.

About the illustrations for the article: one of the difficulties in understanding the language of computers is the fact that often the meaning of words depends on the context and even on the appearance of letters and words. In the images cited in the article, several artists demonstrate the use of various visual cues that convey a meaning that goes beyond the letters themselves.

In the midst of a tense game of Go in Seoul, South Korea between Lee Sedol, one of the best players of all time, and AlphaGo, an AI created by Google, the program made a mysterious move that demonstrated its mind-boggling superiority over a human opponent.

On the 37th move, AlphaGo decided to put the black stone in a strange position at first glance. Everything went to the point that she should have lost a significant piece of territory - a beginner's mistake in a game built on control over space on the board. Two TV commentators were discussing whether they had correctly understood the course of the computer and whether it had broken. It turned out that, despite the contradiction of common sense, the 37th move allowed AlphaGo to build an insurmountable structure in the center of the board. Google's program essentially won the game with a move no human could have thought of.

It's also impressive because the ancient game of Go was often seen as a test of intuitive intelligence. Its rules are simple. Two players take turns placing black or white stones at the intersections of the horizontal and vertical lines of the board, trying to surround the opponent's stones and remove them from the board. But it's incredibly difficult to play it well.

If chess players are able to calculate the game a few steps ahead, in Go it quickly becomes an unimaginably difficult task, and besides, there are no classic gambits in the game. There is also no easy way to measure the advantage, and even for an experienced player it can be difficult to explain why he made a particular move. Because of this, it is impossible to write a simple set of rules that a program playing at the expert level would follow.

AlphaGo was not taught to play Go. The program analyzed hundreds of thousands of games and played millions of matches against itself. Among various AI techniques, she used a method that is gaining popularity known as deep learning. It is based on mathematical calculations, the method of which is inspired by how interconnected layers of neurons in the brain are activated when new information is processed. The program taught itself over many hours of practice, gradually honing its intuitive sense of strategy. And the fact that she was then able to beat one of the best Go players in the world is a new milestone in machine intelligence and AI.

A few hours after move 37, AlphaGo won the game and took a 2-0 lead in a five-game match. After that, Sedol stood in front of a crowd of journalists and photographers and politely apologized for letting humanity down. "I'm speechless," he said, blinking under bursts of flashlights.

The amazing success of AlphaGo shows how much progress has been made in AI over the past few years, after decades of desperation and problems described as "AI winter". Deep learning allows machines to learn on their own how to perform complex tasks that would have been unimaginable a few years ago without the participation of human intelligence. Robomobiles are already looming on the horizon. In the near future, systems based on deep learning will help with disease diagnosis and treatment recommendations.

But despite these impressive advances, one of the core capabilities of AI is still elusive: language. Systems like Siri and IBM Watson can recognize simple verbal and written commands and answer simple questions, but they are unable to carry on a conversation or actually understand the words used. For AI to change our world, this has to change.

Although AlphaGo does not speak, it has technology that can give a better understanding of the language. At Google, Facebook, Amazon, and in science labs, researchers are trying to solve this stubborn problem using the same AI tools — including deep learning — that are responsible for AlphaGo's success and AI resurgence. Their success will determine the scope and characteristics of what is already beginning to turn into an AI revolution. This will determine our future – whether we will have machines with which it will be easy to communicate, or whether AI systems will remain mysterious black boxes, albeit more autonomous ones. “There is no way you can create a humanoid system with AI if it is not based on a language,” says Josh Tenenbaum, professor of cognitive science and computing at MIT. "It's one of the most obvious things that define human intelligence."

Perhaps the same technologies that allowed AlphaGo to conquer Go will also allow computers to master the language, or something else will be required. But without language understanding, the impact of AI will be different. Of course, we will still have unrealistically powerful and intelligent programs like AlphaGo. But our relationship with AI will not be as close, and probably not as friendly. “The big question from the beginning of the research was, ‘What if you got devices that are intelligent in terms of efficiency, but not like us in terms of lack of empathy for who we are?’ says Terry Winograd, emeritus professor at Stanford University. “You can imagine machines based on non-human intelligence, working with big data and running the world.”

Machine Talkers

A couple of months after the triumph of AlphaGo, I went to Silicon Valley, the heart of the AI boom. I wanted to meet with researchers who have made significant progress in the practical applications of AI and who are trying to give machines understanding of language.

I started with Winograd, who lives in the suburbs on the southern edge of Stanford's Palo Alto campus, not far from the headquarters of Google, Facebook, and Apple. His curly gray hair and bushy mustache give him the air of a respectable scholar, and he infects with his enthusiasm.

In 1968, Winograd made one of the earliest attempts to teach machines to talk. A math wunderkind with a passion for language, he came to MIT's new AI lab to get his Ph.D. He decided to create a program that communicates with people through text input in everyday language. At the time, it didn't seem like such an audacious goal. Very big strides have been made in AI development, and other teams at MIT have been building sophisticated systems of computer vision and robotic arms. “There was a sense of unknown and unlimited possibilities,” he recalls.

But not everyone thought that the language was so easy to conquer. Some critics, including the influential linguist and MIT professor Noam Chomsky, thought that it would be very difficult for AI researchers to teach machines to understand because the mechanics of language in humans were so poorly understood. Winograd recalls a party where student Chomsky walked away from him after hearing he was working in an AI lab.

But there are also reasons for optimism. Joseph Weizenbaum, a German-born MIT professor, made the first chatbot program a couple of years ago. Her name was ELIZA and she was programmed to respond like a cartoon psychologist, repeating key parts of statements or asking questions to encourage conversation. If you told her that you were angry with your mother, the program could answer "What else comes to your mind when you think about your mother?". A cheap trick that worked surprisingly well. Weizenbaum was shocked when some of the test subjects confided their dark secrets to his machine.

Winograd wanted to make something that could convincingly pretend to understand the language. He started by reducing the scope of the problem. He created a simple virtual environment, a "block world", consisting of a collection of fictitious objects on a fictitious table. He then created a program, called SHRDLU, that could parse all the nouns, verbs, and simple grammar rules needed to communicate in this simplified virtual world. SHRDLU (a nonsensical word made up of linotype keyboard letters in a row) could describe objects, answer questions about their relationships, and change the blocky world in response to input commands. She even had a certain memory, and if you asked her to move the “red cone” and then wrote about a certain cone, she assumed that you meant this red cone, and not any other.

SHRDLU has become a banner of great progress in the field of AI. But it was just an illusion. When Winograd tried to expand the blocky world of the program, the rules needed to account for extra words and grammar complexity became unmanageable. After only a few years, he gave up and left the field of AI, concentrating on other research. “The restrictions turned out to be much stronger than it seemed at the time,” he says.

Winograd decided that with the tools available at the time, it was impossible to teach a machine to truly understand language. The problem, according to Hubert Dreyfus, professor of philosophy at the University of California at Berkeley, in his 1972 book What Computers Can't Do, is that a lot of human action requires instinctive understanding that cannot be given by a set of simple rules. That is why, before the start of the match between Sedol and AlphaGo, many experts doubted that machines could master the game of Go.

But while Dreyfus was arguing his point, several researchers were developing an approach that would eventually give machines the kind of intelligence they needed. Inspired by neuroscience, they experimented with artificial neural networks, layers of mathematical simulations of neurons that can be trained to fire in response to certain inputs. In the beginning, these systems were impossibly slow, and the approach was dismissed as impractical for logic and reasoning. However, the key capability of neural networks was the ability to learn what was not hand-programmed, and later proved useful for simple tasks like handwriting recognition. This skill found commercial use in the 1990s for reading numbers from checks. Proponents of the method were confident that over time, neural networks would allow machines to do much more. They claimed that someday this technology will help to recognize the language.

Over the past few years, neural networks have become more complex and powerful. The approach flourished thanks to key mathematical improvements, and more importantly, faster computer hardware and the emergence of vast amounts of data. By 2009, researchers at the University of Toronto had shown that multilayer deep learning networks could recognize speech with record-breaking accuracy. And in 2012, the same group won a machine vision competition using a deep learning algorithm that showed amazing accuracy.

A deep learning neural network recognizes objects in pictures with a simple trick. The layer of simulated neurons receives input in the form of a picture, and some of the neurons fire in response to the intensity of individual pixels. The resulting signal passes through many layers of interconnected neurons before reaching the output layer signaling the observation of an object. A mathematical technique called "backpropagation" is used to adjust the sensitivity of the neurons in the network to generate the correct response. It is this step that gives the system the opportunity to learn. Different layers in the network respond to properties such as edges, colors, or texture. Such systems are now capable of recognizing objects, animals, or faces with accuracy rivaling that of humans.

There is an obvious problem with applying deep learning technology to a language. Words are arbitrary symbols, and in this they are essentially different from images. Two words can have a similar meaning and contain completely different letters. And the same word can mean different things depending on the context.

In the 1980s, researchers came up with the clever idea of turning language into the type of problem that a neural network could handle. They showed that words could be represented as mathematical vectors, allowing the similarity of related words to be calculated. For example, "boat" and "water" are close in vector space, although they look different. University of Montreal researchers led by Yoshua Bengio and another group at Google have used this idea to build networks in which each word in a sentence is used to build a more complex representation. Geoffrey Hinton, a professor at the University of Toronto and a prominent deep learning researcher who also works for Google, calls this a “think vector.”

Using two such networks, it is possible to make translations from one language to another with excellent accuracy. And by combining these types of networks with one that recognizes objects in pictures, you can get amazingly accurate subtitles.

Meaning of life

Sitting in a conference room in the heart of Google's bustling headquarters in Mountain View, Calif., one of the company's researchers who developed the approach, Kuok Lee, talks about the idea of a machine capable of holding a real conversation. Lee's ambition explains how talking machines can be useful. “I need a way to simulate thoughts in a machine,” he says. “And if you want to simulate thoughts, then you can ask the machine what it’s thinking about.”

Google is already teaching its computers the basics of the language. In May, the company unveiled Parsey McParseface, a system capable of recognizing syntax, nouns, verbs, and other text elements. It's easy to see how understanding a language can help a company. The Google search algorithm once simply tracked keywords and links between web pages. Now the RankBrain system reads the text of the pages to understand its meaning and improve search results. Lee wants to take this idea even further. Adapting a system that proved useful for translations and captions of pictures, he and colleagues created Smart Reply, which reads the contents of Gmail emails and suggests possible replies. They also created a program that learned from Google's support chat to answer simple technical questions.

Lee recently created a program that can generate tolerable answers to difficult questions. She trained on dialogue from 18,900 films. Some of the answers are frighteningly accurate. For example, Lee asked "What is the meaning of life?" and the program answered "In the service of the greater good." “Not a bad answer,” he recalls with a smirk. “Perhaps better than I would have answered myself.”

There is only one problem that becomes apparent when looking at more of the system's responses. When Lee asked, "How many legs does a cat have?" the system replied, "I think four." Then he asked "How many legs does a centipede have?" and got the strange answer "Eight". In fact, the Lee program does not understand what it is talking about. She understands that some combinations of characters go together, but she doesn't understand the real world. She doesn't know what a centipede looks like, or how it moves. It is still an illusion of intelligence, without the common sense that people take for granted. Deep learning systems are pretty shaky in this sense. Google's image captioning system sometimes makes weird mistakes, like describing a road sign as a food fridge.

By a strange coincidence, Terry Winograd's neighbor in Palo Alto turned out to be a man who can help computers better understand the real meaning of words. Fei-Fei Li, director of the Stanford AI Lab, was on maternity leave when I visited, but she invited me home and proudly introduced me to her three-month-old baby, Phoenix. “Notice that she's looking at you more than she's looking at me,” Lee said as Phoenix stared at me. - It's because you're new; it's early face recognition."

Li has spent much of her career researching machine learning and computer vision. A few years ago, under her leadership, an attempt was made to create a database of millions of images of objects, each of which was signed with the appropriate keywords. But Lee believes that machines need a more sophisticated understanding of what's going on in the world, and this year her team released another image database with richer annotations. For each picture, people made dozens of captions: "A dog on a skateboard", "The dog has thick fluttering fur", "The road with cracks" and so on. They hope that machine learning systems will learn to understand the physical world. “The linguistic part of the brain receives a lot of information, including from the visual system,” says Lee. “An important part of AI will be the integration of these systems.”

This process is closer to teaching children to associate words with objects, relationships, and actions. But the analogy with teaching people doesn't go too far. Kids don't need to see a dog on a skateboard to imagine or describe it in words. Lee believes that today's tools for AI and machine learning will not be enough to create real AI. “It won't just be deep learning with a large data set,” she says. “We humans are very bad at big data calculations, but very good at abstraction and creativity.”

No one knows how to endow machines with these human qualities and whether it is possible at all. Is there something exceptionally human about these qualities that prevents the AI from having them?

Cognitive scientists such as MIT's Tenenbaum believe that today's neural networks lack critical components of the mind, regardless of the size of these networks. Humans are able to learn relatively quickly on relatively small amounts of data, and they have the built-in ability to efficiently model the 3D world. “Language is built on other capabilities that probably lie deeper and are present in babies even before they begin to speak language: visual perception of the world, working with our motor apparatus, understanding the physics of the world and the intentions of other creatures,” says Tenenbaum.

If he is right, then without trying to simulate the human learning process, create mental models and psychology, it will be very difficult to recreate the understanding of language in AI.

explain yourself

Noah Goodman's office in the Stanford Department of Psychology is almost empty except for a couple of abstract paintings on one of the walls and a few overgrown plants. When I arrived, Goodman was scribbling on a laptop with his bare feet on the table. We walked around the sun-drenched campus to buy iced coffees. “The peculiarity of the language is that it relies not only on a large amount of information about the language, but also on the universal understanding of the world around us, and these two areas of knowledge are implicitly connected with each other,” he explains.

Goodman and his students developed the Webppl programming language, which can be used to endow computers with probabilistic common sense, which turns out to be quite important in conversations. One experimental version is able to recognize puns, and the other is able to recognize hyperbole. If she is told that some people have to spend "eternity" waiting for a table in a restaurant, she will automatically assume that the literal meaning of the word in this case is unlikely and that people are likely to wait quite a long time and get annoyed. The system may not yet be called true intelligence, but it shows how new approaches can help AI programs to talk a little more vitally.

Goodman's example also shows how difficult it will be to teach languages to machines. Understanding the meaning of the concept of "eternity" in a certain context is an example of what AI systems will have to learn, while it is actually a rather simple and rudimentary thing.

However, despite the complexity and intricacy of the task, the initial successes of researchers using deep learning for pattern recognition or the game of Go give hope that we are on the verge of a breakthrough in the language field as well. In this case, this breakthrough arrived just in time. If AI is to become a universal tool, to help humans supplement and amplify their own intelligence, and to perform tasks in a seamless symbiosis mode, then language is the key to achieving this state. Especially if AI systems increasingly use deep learning and other technologies for self-programming.

“In general, deep learning systems are awe-inspiring,” says John Leonard, a professor who studies robotic vehicles at MIT. “On the other hand, their work is quite difficult to understand.”

Autonomous driving technology company Toyota has launched a research project at MIT led by Gerald Sussman, an expert in AI and programming languages, to develop an autonomous driving system that can explain why she took a certain action at some point. The obvious way to give such an explanation would be verbal. “Building knowledge-aware systems is a very difficult task,” says Leonard, who leads another Toyota project at MIT. “But, yes, ideally they should give not just an answer, but an explanation.”

A few weeks after returning from California, I met with David Silver, a Google DeepMind researcher and developer of AlphaGo. He spoke about the match against Sedol at a scientific conference in New York. Silver explained that when the program in the second game made its decisive move, his team was surprised no less than the others. They could only see that AlphaGo predicted the odds of winning, and that prediction changed little after move 37. Only a few days later, after carefully analyzing the game, the team made a discovery: after digesting previous games, the program calculated that a human player could make such a move with a probability of 1 in 10,000. And its training games showed that such a maneuver provides an unusually strong positional advantage .

So, in a way, the machine knew that this move would hit Sedol's weak spot.

Silver said Google is looking at several ways to commercialize the technology, including smart assistants and healthcare tools. After the lecture, I asked him about the importance of being able to communicate with the AI that controls such systems. “Interesting question,” he said after a pause. – For some applications this may be useful. For example, in healthcare, it can be important to know why a particular decision was made.”

Indeed, AIs are becoming more and more complex and intricate, and it is very difficult to imagine how we would work with them without language - without the ability to ask them, “Why?”. Moreover, the ability to easily communicate with computers would make them more useful and would look like magic. After all, language is the best way we can understand and interact with the world. It's time for the cars to catch up with us.

LISP

Java

The main features of java are: easy debugging, good user experience, easy to work with large projects. Projects created using the Java language have an attractive and simple interface.

Prolog

Prolog is a declarative language with formal logic. AI developers value it for its high level of abstraction, built-in search engine, non-determinism, and so on.

Python

History of AI development

The artificial intelligence system Dendral was built in 1965. It helped to easily determine the molecular structure of organic compounds. This system was written in Lisp.

In the next 15 years, the world saw many amazing inventions: Denning's sentinel robot, LMI Lambda, Omnibot 2000, MQ-1 Predator drone, Furby, AIBO robot dog, and Honda ASIMO.

In 2003, iRobot invented the Roomba robot vacuum cleaner. Developed in Lisp, this stand-alone vacuum cleans floors using specific algorithms. It detects obstacles and bypasses them.

What programming language do you use to develop AI programs? Write about your work in the comments or in our VKontakte group.

What language is artificial intelligence written in? Introduction to AI. Little-known languages ​​for creating artificial intelligence

LISP

Java

Prolog

Python

History of AI development

Ups and downs of approaches in AI

The main tasks in the design of artificial intelligence

Knowledge Representation

Bibliography.

Language-understanding machines would be very useful. But we don't know how to build them.

Machine Talkers

Meaning of life

explain yourself

LISP

Java

Prolog

Python

History of AI development

Top Related Articles

What language is artificial intelligence written in? Introduction to AI. Little-known languages for creating artificial intelligence