In simple words about the complex: what are neural networks? An introduction to deep learning.

06.06.2019 Iron

And in parts, this guide is intended for anyone interested in machine learning but doesn't know where to start. The content of the articles is intended for a wide audience and will be rather superficial. But does anyone care? The more people are interested in machine learning, the better.

Object recognition using deep learning

You may have already seen this famous xkcd comic. The joke is that any 3-year-old child can recognize a photograph of a bird, but getting the computer to do it has taken the very best computer scientists over 50 years. In the last few years, we've finally found a good approach to object recognition using deep convolutional neural networks... This sounds like a bunch of made-up words from a fantasy novel by William Gibson, but it will become clear when we break them down one by one. So let's do it - write a program that recognizes birds!

Let's start simple

Before we learn how to recognize images of birds, let's learn how to recognize something much simpler - the handwritten number "8".

What is deep learning? March 3rd, 2016

Now they talk about fashionable technologies of deep learning, as if it were manna from heaven. But do the speakers understand what it really is? But this concept has no formal definition, and it unites a whole stack of technologies. In this post, I want to be as popular as possible, and in fact explain what is behind this term, why it is so popular and what these technologies give us.

In short, this newfangled term (deep learning) is about how to assemble a more complex and deep abstraction (representation) from some simple abstraction. besides, even the simplest abstractions should be assembled by the computer itself, and not by a person... Those. it is no longer just about learning, but about meta-learning. Figuratively speaking, the computer must independently learn how to learn it best. And, in fact, the term "deep" implies exactly that. Almost always, this term is applied to artificial neural networks, where more than one hidden layer is used, therefore, formally "deep" also means a deeper architecture of the neural network.

On the development slide, you can clearly see how deep learning differs from ordinary learning. I repeat, unique to deep learning is that the machine itself finds features(the key features of something that make it easier to separate one class of objects from another) and These features are structured hierarchically: more complex ones are formed from simpler ones... Below we will analyze this with an example.

Let's look at an example of an image recognition problem: before, they stuffed a huge (1024 × 768 - about 800,000 numerical values) picture into a regular neural network with one layer and watched the computer slowly die, suffocating from lack of memory and inability to understand which pixels are important for recognition, and which are not. Not to mention the effectiveness of this method. Here is the architecture of such a common (shallow) neural network.

Then, nevertheless, they listened to how the brain distinguishes signs, and it does it strictly hierarchically, and we also decided to extract a hierarchical structure from the pictures. To do this, it was necessary to add more hidden layers (layers that are between the input and output; roughly speaking, stages of information transformation) into the neural network. Although they decided to do this almost as soon as neurons were invented, then networks with only one hidden layer were successfully trained. Those. in principle, deep networks have existed for about as long as ordinary ones, we just could not train them. What has changed?

In 2006, several independent researchers solved this problem at once (besides, the hardware capacities had already developed enough, quite powerful video cards appeared). These researchers: Jeffrey Hinton (and his colleague Ruslan Salakhutidinov) with the technique of preliminary training of each layer of the neural network with a limited Boltzmann machine (forgive me for these terms ...), Ian Lecun with convolutional neural networks, and Yoshuaia Benjio with cascade autoencoders. The first two were immediately recruited by Google and Facebook, respectively. Here are two lectures: one - Hinton, the other is Lyakuna, in which they explain what deep learning is. Better than them, no one will tell about it. Another cool lecture Schmidhuber on the development of deep learning, also one of the pillars of this science. And Hinton also has an excellent course on the neuron cursors.

What are deep neural networks capable of now? They are able to recognize and describe objects, we can say "understand" what it is. It's about meaning recognition.

Just watch this video of real-time recognition of what the camera sees.

As I said, deep learning technologies are a whole group of technologies and solutions. I have already listed several of them in the paragraph above, another example is recurrent networks, which are just used in the video above to describe what the network sees. But the most popular representative of this class of technologies is still LaCun convolutional neural networks. They are built by analogy with the principles of the visual cortex of the cat's brain, in which the so-called simple cells were discovered that respond to straight lines at different angles, and complex - the reaction of which is associated with the activation of a certain set of simple cells. Although, to be honest, LaKun himself was not oriented towards biology, he was solving a specific problem (see his lectures), and then it coincided.

Quite simply, convolutional networks are networks where the main structural element of learning is a group (combination) of neurons (usually a 3 × 3.10 × 10 square, etc.), and not one. And at each level of the network, dozens of such groups are trained. The network finds combinations of neurons that maximize information about the image. At the first level, the network extracts the most basic, structurally simple elements of the picture - one might say, building units: borders, strokes, segments, contrasts. Above, there are already stable combinations of first-level elements, and so on up the chain. I want to once again highlight the main feature of deep learning: networks themselves form these elements and decide which of them is more important and which are not. This is important, because in the field of machine learning, the creation of features is key and now we are moving to the stage when the computer itself learns to create and select features. The machine itself identifies the hierarchy of informative features.

So, in the process of training (viewing hundreds of pictures), the convolutional network forms a hierarchy of features of different levels of depth. Here at the first level, they can highlight, for example, such elements (reflecting contrast, angle, border, etc.).

At the second level, this will already be an element from the elements of the first level. On the third - from the second. You must understand that this picture is just a demonstration. Now in industrial use, such networks have from 10 to 30 layers (levels).

After such a network has been trained, we can use it for classification. Having submitted an image to the input, groups of neurons of the first layer run over the image, activating in those places where there is a picture element corresponding to a specific element. Those. this network parses the picture into parts - first into dashes, strokes, tilt angles, then more complex parts, and at the end it comes to the conclusion that a picture from this kind of combination of basic elements is a face.

More about convolutional networks -

I learned about business trends at a large-scale conference in Kiev. The whole process is full of Saturday insights, as they took away the new knowledge and knowledge, the result of which was spent on the hour. There are 4 streams of additional ideas for business owners, TOP-managers, marketers, sales, employees and specials at the confi-boule 4. One of the spokespeople for the Ministry of Infrastructure Volodymyr Omelyan, who spoke about the development of galusies, renewed roads and airports.

Good day to all, dear colleagues iOS-nicknames, for sure each of you worked with the network and was engaged in parsing data from JSON. For this process, there are a bunch of libraries, all kinds of tools that you can use. Some are complex and some are simple. For a very long time myself, I honestly parsed JSON by hand, not trusting this process to some third-party libraries, and this had its advantages.

On September 9, 2014, during a regular presentation, Apple introduced its own mobile payment system - Apple Pay.

With Apple Pay, iPhone 6 and iPhone 6+ users and the latest Apple Watch can shop online, take advantage of the added benefits of apple pay for mobile apps, and make payments using Near Field Communication (NFC) technology. For authorization of payments, Touch ID or Face ID technologies are used.

Technologies do not stand still, and development processes move with them. If earlier companies worked according to the "Waterfall" model, now, for example, everyone is striving to implement "Scrum". Evolution is also taking place in the provision of software development services. While companies used to provide clients with quality development on budget, stopping there, now they strive to provide maximum value for the client and his business by providing their expertise.

Over the past few years, so many good fonts have appeared, including free ones, that we decided to write a sequel to ours for designers.

Every designer has a set of favorite fonts to work with, which he is used to working with and which reflect his graphic style. Designers say “There are never many good fonts,” but now we can safely imagine a situation when this set consists only of free fonts.

How often do project managers find themselves between rock and hard place when trying to find a balance between all the requirements and deadlines of the customer and the mental health of the entire team? How many nuances must be taken into account so that there is peace and order on both sides of responsibility? How do you understand that you are a good manager or do you urgently need to catch up on all fronts? How to determine in what aspects it is you, as a PM, who are lagging behind, and where are you a good fellow and a smart girl? This is what the next Code'n'Coffee conference was about.

Pattern recognition technology is increasingly included in our everyday life. Companies and institutions use it for everything from security to customer satisfaction research. Investments in products based on this feature are expected to grow to $ 39 billion by 2021. Here are just a few examples of how pattern recognition is used in various fields.

"(Manning Publications).

This article is intended for people who already have significant experience with deep learning (for example, those who have already read Chapters 1-8 of this book). A lot of knowledge is assumed.

Deep Learning: Geometric View

The most amazing thing about deep learning is how simple it is. Ten years ago, no one could have imagined what amazing results we would achieve in machine perception problems using simple parametric models trained with gradient descent. Now it turns out that we only need big enough parametric models trained on big enough the number of samples. As Feynman once said about the universe: “ It is not difficult, there is just a lot of it».

In deep learning, everything is a vector, i.e. point v geometric space... The input data of the model (it can be text, images, etc.) and its targets are first "vectorized", that is, translated into some initial vector space at the input and the target vector space at the output. Each layer in a deep learning model performs one simple geometric transformation on the data that flows through it. Together, the model's layer chain creates one very complex geometric transformation, broken down into a series of simple ones. This complex transformation attempts to transform the input data space into the target space, for each point. The transformation parameters are determined by the layer weights, which are constantly updated based on how well the model is currently performing. The key characteristic of geometric transformation is that it must be differentiable, that is, we should be able to find out its parameters through gradient descent. Intuitively, this means that geometric morphing must be fluid and continuous - an important limitation.

The whole process of applying this complex geometric transformation on the input data can be visualized in 3D by depicting a person trying to unfold a paper ball: a crumpled paper ball is a variety of input data with which the model starts working. Each movement of a person with a paper ball is like a simple geometric transformation performed by one layer. A complete sequence of unfolding gestures is a complex transformation of the entire model. Deep learning models are mathematical machines for unrolling the tangled manifold of multidimensional data.

This is the magic of deep learning: turn a value into vectors, into geometric spaces, and then gradually learn complex geometric transformations that transform one space into another. All that is needed is spaces of a sufficiently large dimension to convey the entire range of relationships found in the original data.

Limitations of deep learning

The set of tasks that can be solved with this simple strategy is almost endless. And yet, many of them are still beyond the reach of current deep learning techniques - even with the vast amount of manually annotated data available. Let's say, for example, that you can collect a dataset of hundreds of thousands - even millions - of English descriptions of software features written by product managers, along with a corresponding reference year developed by engineering teams to meet these requirements. Even with this data, you cannot train a deep learning model by simply reading the product description and generating the appropriate codebase. This is just one example among many. In general, anything that requires argumentation, reasoning - like programming or applying scientific method, long-term planning, manipulating data in an algorithmic style - is beyond the capabilities of deep learning models, no matter how much data you throw at them. Even teaching a neural network to sort the algorithm is incredibly difficult.

The reason is that the deep learning model is "only" chain of simple, continuous geometric transformations that transform one vector space to another. All it can do is transform one dataset X to another dataset Y, provided there is a possible continuous transformation from X to Y that can be learned, and the availability dense sample set transform X: Y as training data. So while the deep learning model can be considered a kind of program, but most programs cannot be expressed as deep learning models- for most problems, either there is no deep neural network of practically suitable size that solves the problem, or if it exists, it can be unteachable, that is, the corresponding geometric transformation may be too complex, or there is no suitable data for training it.

Scaling up existing deep learning techniques - adding more layers and using more training data - can only superficially alleviate some of these problems. It will not solve the more fundamental problem that deep learning models are very limited in what they can represent, and that most programs cannot be expressed as continuous geometric morphing of a manifold of data.

The risk of anthropomorphizing machine learning models

One of the very real risks of modern AI is misinterpreting how deep learning models work and exaggerating their capabilities. A fundamental feature of the human mind is the “model of the human psyche,” our tendency to project goals, beliefs and knowledge onto things around us. Drawing a smiling face on a stone suddenly makes us "happy" - mentally. Applied to deep learning, this means, for example, that if we can more or less successfully train a model to generate textual descriptions of pictures, then we tend to think that the model "understands" the content of the images as well as the generated descriptions. We are then very surprised when, due to a small deviation from the set of images presented in the training data, the model begins to generate completely absurd descriptions.

In particular, this is most pronounced in "adversarial examples", that is, samples of the input data of the deep learning network, specially selected to be misclassified. You already know that you can do gradient ascent in the input space to generate patterns that maximize activation, for example, a particular convolutional neural network filter - this is the core of the rendering technique we covered in Chapter 5 (Note: Deep Learning with Python books) , just like the Deep Dream algorithm from Chapter 8. In a similar way, through gradient ascent, you can slightly resize the image to maximize the class prediction for a given class. If we take a photo of a panda and add a gibbon gradient, we can force the neural network to classify that panda as a gibbon. This demonstrates both the fragility of these models and the profound difference between the entry-to-exit transformation that guides it and our own human perception.

In general, deep learning models lack understanding of inputs, at least not in a human sense. Our own understanding of images, sounds, language is based on our sensorimotor experience as humans - as material earthly beings. Machine learning models do not have access to this experience and therefore cannot "understand" our inputs in any human-like manner. By annotating a large number of training examples for our models, we force them to learn a geometric transformation that brings data to human concepts for that specific set of examples, but this transformation is just a simplified sketch of the original model of our mind, as developed from our experience as bodily agents are like a faint reflection in a mirror.

As a practitioner in machine learning, always keep this in mind, and never fall into the trap of believing that neural networks understand the task they are doing - they don't, at least not in the way that makes sense to us. They were trained in a different, much narrower task than the one we want to teach them: simply converting input training samples to target training samples, point-to-point. Show them anything that is different from the training data and they break in the most absurd way.

Local generalization versus limit generalization

There seems to be a fundamental difference between the direct geometric entry-to-exit morphing that deep learning models do and the way people think and learn. It’s not just that people learn from their bodily experience, and not through processing a set of training patterns. Besides the difference in learning processes, there are fundamental differences in the nature of the underlying beliefs.

Humans are capable of much more than converting an immediate stimulus into an immediate response, like a neural network or maybe an insect. People keep in their consciousness complex, abstract models of the current situation, themselves, other people, and can use these models to predict various possible options for the future, and carry out long-term planning. They are capable of combining well-known concepts into a coherent whole to present things that they never knew before - like drawing a horse in jeans, for example, or depicting what they would do if they won the lottery. The ability to think hypothetically, to expand our model of mental space far beyond what we directly experienced, that is, the ability to do abstractions and reasoning is perhaps the defining characteristic of human cognition. I call this "ultimate generalization": the ability to adapt to new, never-before-experienced situations using very little or no data at all.

This is in stark contrast to what deep learning networks do, which I would call "local generalization": converting inputs to outputs quickly becomes meaningless if new inputs are even slightly different from what they encountered during training. ... Consider, for example, the problem of learning the appropriate launch parameters for a rocket to land on the moon. If you used a neural network for this task, teaching it with a teacher or with reinforcement, you would need to give it thousands or millions of flight paths, that is, you need to issue dense set of examples in the space of incoming values in order to learn a reliable transformation from the space of incoming values to the space of outgoing values. In contrast, humans can use the power of abstraction to create physical models - rocket science - and come up with a precise solution that gets a rocket to the moon in just a few tries. Likewise, if you developed a neural network to control the human body and want it to learn how to safely walk through the city without being hit by a car, the network must die many thousands of times in various situations before it concludes that cars are dangerous and does not work out. appropriate behavior to avoid them. If you move her to a new city, then the network will have to re-learn most of what she knew. On the other hand, humans are able to learn safe behaviors without ever dying — again, thanks to the power of abstract modeling of hypothetical situations.

So, despite our progress in machine perception, we are still very far from human-level AI: our models can only execute local generalization adapting to new situations, which should be very close to past data, while the human mind is capable of ultimate generalization by quickly adapting to completely new situations or planning far into the future.

conclusions

Here's what you need to remember: The only real success of deep learning to date is the ability to translate X-space into Y-space using continuous geometric transformations when there is a lot of human annotated data. Performing this task well represents a revolutionary achievement for the entire industry, but human AI is still a long way off.

To remove some of these limitations and compete with the human brain, we need to move away from direct entry-to-exit transformation and move on to reasoning and abstractions... Perhaps a suitable basis for abstract modeling of various situations and concepts may be computer programs. We have said before (note: in the book Deep Learning with Python) that machine learning models can be defined as "learning programs"; at the moment we can only train a narrow and specific subset of all possible programs. But what if we could train each program, modularly and repeatedly? Let's see how we can come to this.

The future of deep learning

Given what we know about deep learning networks, their limitations, and the current state of scientific research, can we predict what will happen in the medium term? Here are some of my personal thoughts on this. Keep in mind that I do not have a crystal ball for divination, so much of what I expect may not come true. This is absolute speculation. I share these predictions not because I expect them to be fully realized in the future, but because they are interesting and applicable in the present.

At a high level, here are the main areas that I consider promising:

Models will approach general purpose computer programs built on top of much richer primitives than our current differentiable layers - so we get reasoning and abstractions, the absence of which is a fundamental weakness of current models.
New forms of learning will emerge that will make this possible - and will allow models to move away from simply differentiable transformations.
Models will require less developer involvement - it shouldn't be your job to constantly tweak the knobs.
Greater, systematic reuse of learned features and architectures will emerge; meta-learning systems based on reusable and modular routines.

In addition, note that this reasoning does not apply specifically to supervised learning, which is still the foundation of machine learning - it also applies to any form of machine learning, including unsupervised learning, supervised learning, and reinforcement learning. It doesn't fundamentally matter where your labels come from or what your learning cycle looks like; these different branches of machine learning are just different facets of the same construct.

So let's go.

Models as programs

As we noted earlier, the necessary transformational development that can be expected in machine learning is moving away from models that perform purely pattern recognition and capable only of local generalization, to models capable of abstractions and reasoning who can achieve ultimate generalization... All current basic reasoning AI programs are hardcoded by human programmers: for example, programs that rely on search algorithms, graph manipulation, formal logic. For example, in DeepMind AlphaGo, most of the “intelligence” on the screen is designed and hardcoded by expert programmers (eg, Monte Carlo tree searches); learning from new data occurs only in specialized submodules - value networks and policy networks. But in the future, such AI systems can be fully trained without human intervention.

How can this be achieved? Let's take a well-known network type: RNN. Importantly, RNNs have slightly fewer constraints than feedforward neural networks. This is because RNNs are little more than simple geometric transformations: they are geometric transformations that are performed continuously in the for loop... The timed for loop is developer-defined: this is a built-in network assumption. Naturally, RNNs are still limited in what they can represent, mainly because each step is still a differentiable geometric transformation and because of the way they transmit information step by step through points in a continuous geometric space ( state vectors). Now imagine neural networks that would be "augmented" by programming primitives in the same way as for loops - but not just a single hard-coded for loop with hard-coded geometric memory, but a large set of programming primitives that the model could freely access to extend its processing capabilities such as if branches, while statements, variable creation, disk storage for long-term memory, sorting statements, advanced data structures like lists, graphs, hash tables, and more. The realm of programs that such a network can represent will be much wider than existing deep learning networks can express, and some of these programs can achieve superior generalization power.

In short, we will move away from the fact that we have, on the one hand, "hard-coded algorithmic intelligence" (hand-written software), and on the other hand, we have "trained geometric intelligence" (deep learning). Instead, we end up with a mixture of formal algorithmic modules that provide the capabilities reasoning and abstractions, and geometry modules that provide the capabilities informal intuition and pattern recognition... The entire system will be trained with little or no human input.

A related area of AI that I think may soon make a big difference is software synthesis, in particular, neural programmed synthesis. Software synthesis consists in the automatic generation of simple programs using a search algorithm (possibly genetic search, as in genetic programming) to explore a large space of possible programs. The search stops when a program is found that meets the required specifications, often provided as a set of I / O pairs. As you can see, this is very reminiscent of machine learning: "training data" is provided as input-output pairs, we find a "program" that matches the transformation of inputs into outputs and is capable of generalizability to new inputs. The difference is that instead of the values of the training parameters in a hard-coded program (neural network), we generate source by a discrete search process.

I definitely expect there will be a lot of interest in this area again in the next few years. In particular, I expect the mutual penetration of related areas of deep learning and program synthesis, where we will not only generate programs in general-purpose languages, but where we will generate neural networks (geometric data processing flows). supplemented a rich set of algorithmic primitives such as for loops - and many others. This should be much more convenient and useful than direct generation of source code, and will significantly expand the boundaries for those problems that can be solved using machine learning - the space of programs that we can generate automatically, receiving the appropriate data for training. A mixture of Symbolic AI and Geometric AI. Modern RNNs can be seen as the historical ancestor of such hybrid algorithm-geometric models.

Drawing: The trained program simultaneously relies on geometric primitives (pattern recognition, intuition) and algorithmic primitives (argumentation, search, memory).

Beyond backpropagation and differentiable layers

If machine learning models become more like programs, then they will hardly be differentiable anymore - certainly, these programs will still use continuous geometric layers as subroutines that will remain differentiable, but the whole model will not be so. As a result, using backpropagation to tune weights in a fixed, hard-coded network may not remain the preferred method for training models in the future - at least it should not be limited to this method alone. We need to figure out how to most effectively train nondifferentiable systems. Current approaches include genetic algorithms, "evolutionary strategies", certain reinforcement learning methods, ADMM (alternating direction Lagrange multiplier method). Naturally, gradient descent is not going anywhere else - gradient information will always be useful for optimizing differentiable parametric functions. But our models will definitely become more ambitious than just differentiable parametric functions, and therefore their automated development ("learning" in "machine learning") will require more than backpropagation.

In addition, backpropagation has an end-to-end framework, which is suitable for learning good concatenated transformations, but rather computationally inefficient because it does not fully exploit the modularity of deep networks. To improve the efficiency of anything, there is one universal recipe: introduce modularity and hierarchy. So we can make the backpropagation itself more efficient by introducing decoupled learning modules with some sort of synchronization mechanism between them, organized in a hierarchical order. This strategy is partly reflected in DeepMind's recent work on "synthetic gradients." I expect much, much more work in this direction in the near future.

One can imagine a future where globally non-differentiable models (but with differentiable parts) will learn - grow - using an efficient search process that will not apply gradients, while differentiable parts will learn even faster using gradients using some more efficient backpropagation versions

Automated machine learning

In the future, model architectures will be created by training rather than being manually written by engineers. The trained models automatically work together with a richer set of primitives and program-like machine learning models.

Nowadays, most of the time, a deep learning developer endlessly modifies data with Python scripts, then takes a long time to tune the architecture and hyperparameters of the deep learning network to get a working model - or even to get an outstanding model, if the developer is so ambitious. Needless to say, this is not the best state of affairs. But AI can help here too. Unfortunately, the data processing and preparation part is difficult to automate, since it often requires knowledge of the field, as well as a clear understanding at a high level of what the developer wants to achieve. However, tuning hyperparameters is a simple search procedure, and in this case, we already know what the developer wants to achieve: this is determined by the loss function of the neural network that needs to be tuned. It has now become common practice to set up basic AutoML systems that handle most of the tweaking of the model settings. I installed one myself to win the Kaggle competition.

At its most basic level, such a system would simply tweak the number of layers in the stack, their order, and the number of items or filters in each layer. This is usually done using libraries like Hyperopt, which we discussed in Chapter 7 (note: Deep Learning with Python books). But you can go much further and try to get an appropriate architecture by training from scratch, with a minimum set of restrictions. This is possible through reinforcement learning, for example, or through genetic algorithms.

Another important direction in the development of AutoML is the training of the model architecture at the same time as the model weights. When we train the model from scratch each time we try slightly different architectures, which is extremely inefficient, so a really powerful AutoML system will drive the architecture development while the model properties are tuned through back propagation on the training data, thus eliminating all the over-computation. As I write these lines, similar approaches have already begun to be applied.

When all of this starts to happen, machine learning developers won't be out of work - they will move to a higher level in the value chain. They will begin to put much more effort into creating complex loss functions that truly reflect business objectives, and they will have a deep understanding of how their models affect the digital ecosystems in which they operate (for example, customers who use model predictions and generate data for its training) - problems that only the largest companies can now afford to consider.

Lifelong learning and reuse of modular routines

If the models become more complex and are built on richer algorithmic primitives, then this increased complexity will require more intensive reuse between tasks, rather than training the model from scratch every time we have a new task or new dataset. After all, many datasets do not contain enough information to develop a new complex model from scratch and it will simply become necessary to use information from previous datasets. You don't re-learn English every time you open a new book - that would be impossible. In addition, training models from scratch on each new task is very ineffective due to the significant overlap between the current tasks and those that were encountered before.

In addition, in recent years, the remarkable observation has been repeatedly made that training the same model to do several loosely coupled tasks improves its results. in each of these tasks... For example, training the same neural network to translate from English to German and from French to Italian will result in a model that is better in each of these language pairs. Training the image classification model simultaneously with the image segmentation model, with a single convolutional base, will lead to a model that is better in both problems. Etc. This is quite intuitive: there is always some information that overlaps between these two seemingly different tasks, and therefore the general model has access to more information about each individual task than the model that was trained only on that particular task.

What we actually do when we reapply the model for different tasks is we use pre-trained weights for models that perform common functions, such as extracting visual cues. You saw this in practice in Chapter 5. I expect that a more general version of this technique will be widely used in the future: we will not only apply previously learned features (submodel weights), but also model architectures and learning procedures. As the models become more program-like, we will begin to reuse subroutines like functions and classes in conventional programming languages.

Think about what the software development process looks like today: once an engineer solves a certain problem (HTTP requests in Python, for example), he packages it as an abstract library for reuse. Engineers facing a similar problem in the future simply search for existing libraries, download and use them in their own projects. Likewise, in the future, meta-learning systems will be able to assemble new programs by sifting through the global library of high-level reusable blocks. If the system starts developing similar routines for several different tasks, it will release an "abstract" reusable version of the routine and store it in the global library. Such a process will open up the opportunity for abstractions, a necessary component to achieve "ultimate generalization": a subroutine that will be useful for many tasks and areas, one might say, "abstracts" some aspect of decision-making. This definition of "abstraction" is not similar to the concept of abstraction in software development. These routines can be either geometric (deep learning modules with pre-trained representations) or algorithmic (closer to the libraries that modern programmers work with).

Drawing: A meta-trainable system capable of rapidly developing task-specific models using reusable primitives (algorithmic and geometric), thereby achieving “ultimate generalization”.

In summary: long-term vision

In short, here's my long-term vision for machine learning:

Models will become more software-like and have capabilities that extend far beyond the continuous geometric transformations of the underlying data that we are currently working with. Perhaps these programs will be much closer to the abstract mental models that people maintain about their environment and about themselves, and they will be capable of stronger generalization due to their algorithmic nature.
In particular, the models will mix algorithmic modules with formal reasoning, searching, ability to abstraction - and geometric modules with informal intuition and pattern recognition. AlphaGo (a system that required intensive manual programming and architecture) is an early example of what the fusion of symbolic and geometric AI might look like.
They will grow up automatically (rather than being written by hand by human programmers), using modular pieces from a global library of reusable routines - a library that has evolved by adopting high-performance models from thousands of previous tasks and datasets. Once the meta-learning system has identified common problem-solving patterns, they are translated into reusable routines - much like functions and classes in modern programming - and added to the global library. This is how the ability abstractions.
The global library and the corresponding model growing system will be able to achieve some form of human-like “ultimate generalization”: faced with a new task, a new situation, the system will be able to assemble a new working model for this task using a very small amount of data, thanks to: 1) rich program-like primitives, who are good at generalizations and 2) extensive experience in solving similar problems. In the same way that people can quickly learn a new complex video game because they have previous experience with many other games and because the models from previous experience are abstract and programmatic rather than simply translating stimulus into action.
Essentially, this continuously learning model growing system can be interpreted as Strong Artificial Intelligence. But do not expect the onset of some singular robo-apocalypse: it is pure fantasy that was born from a long list of deep misunderstandings in the understanding of intelligence and technology. However, this criticism has no place here.

Artificial intelligence, neural networks, machine learning - what do all these popular concepts really mean? For most of the uninitiated people, which I myself am, they always seemed like something fantastic, but in fact their essence lies on the surface. I had an idea for a long time to write in simple language about artificial neural networks. Learn for yourself and tell others what this technology is, how it works, consider its history and prospects. In this article, I tried not to get into the jungle, but simply and popularly tell about this promising direction in the world of high technologies.

Artificial intelligence, neural networks, machine learning - what do all these popular concepts really mean? For most uninitiated people, which I myself am, they always seemed like something fantastic, but in fact, their essence lies on the surface. I had an idea for a long time to write in simple language about artificial neural networks. Learn for yourself and tell others what this technology is, how it works, consider its history and prospects. In this article, I tried not to get into the jungle, but simply and popularly tell about this promising direction in the world of high technologies.

A bit of history

For the first time, the concept of artificial neural networks (ANN) arose when trying to simulate the processes of the brain. The first major breakthrough in this area can be considered the creation of the McCulloch-Pitts neural network model in 1943. Scientists first developed a model of an artificial neuron. They also proposed the construction of a network of these elements to perform logical operations. But most importantly, scientists have proven that such a network is capable of learning.

The next important step was the development by Donald Hebb of the first algorithm for computing ANN in 1949, which became fundamental for the next several decades. In 1958, Frank Rosenblatt developed the parceptron, a system that mimics the processes of the brain. At one time, the technology had no analogues and is still fundamental in neural networks. In 1986, practically simultaneously, independently of each other, American and Soviet scientists significantly improved the fundamental method of teaching the multilayer perceptron. In 2007, neural networks underwent a rebirth. British computer scientist Jeffrey Hinton pioneered a deep learning algorithm for multilayer neural networks, which is now, for example, used to operate self-driving cars.

Briefly about the main thing

In the general sense of the word, neural networks are mathematical models that work on the principle of networks of nerve cells in an animal organism. ANNs can be implemented in both programmable and hardware solutions. For ease of perception, a neuron can be imagined as a certain cell, which has many input holes and one output hole. How numerous incoming signals are formed into outgoing ones determines the calculation algorithm. Effective values are supplied to each input of a neuron, which are then propagated along interneuronal connections (synopsis). Synapses have one parameter - weight, due to which the input information changes when moving from one neuron to another. The easiest way to imagine how neural networks work can be represented by the example of color mixing. The blue, green, and red neurons have different weights. The information of that neuron, the weight of which will be more dominant in the next neuron.

The neural network itself is a system of many such neurons (processors). Individually, these processors are quite simple (much simpler than a personal computer processor), but when connected into a large system, neurons are capable of performing very complex tasks.

Depending on the area of application, a neural network can be interpreted in different ways.For example, from the point of view of machine learning, an ANN is a pattern recognition method. From a mathematical point of view, this is a multi-parameter problem. From the point of view of cybernetics, it is a model of adaptive control of robotics. For artificial intelligence, ANN is a fundamental component for modeling natural intelligence using computational algorithms.

The main advantage of neural networks over conventional computation algorithms is their ability to learn. In the general sense of the word, learning consists in finding the correct coupling coefficients between neurons, as well as in generalizing data and identifying complex dependencies between input and output signals. In fact, successful training of a neural network means that the system will be able to identify the correct result based on data not present in the training set.

Today's situation

And no matter how promising this technology would be, so far ANNs are still very far from the capabilities of the human brain and thinking. Nevertheless, neural networks are already being used in many areas of human activity. So far, they are not able to make highly intellectual decisions, but they are able to replace a person where he was previously needed. Among the numerous areas of application of ANNs, one can note: the creation of self-learning systems of production processes, unmanned vehicles, image recognition systems, intelligent security systems, robotics, quality monitoring systems, voice interaction interfaces, analytics systems and much more. Such widespread use of neural networks is, among other things, due to the emergence of various ways to accelerate the learning of ANN.

Today the market for neural networks is huge - it is billions and billions of dollars. As practice shows, most of the technologies of neural networks around the world differ little from each other. However, the use of neural networks is a very costly exercise, which in most cases can only be afforded by large companies. For the development, training and testing of neural networks, large computing power is required, it is obvious that large players in the IT market have enough of this. Among the main companies leading development in this area are Google DeepMind, Microsoft Research, IBM, Facebook and Baidu.

Of course, all this is good: neural networks are developing, the market is growing, but so far the main task has not been solved. Humanity has failed to create a technology that is even close in capabilities to the human brain. Let's take a look at the main differences between the human brain and artificial neural networks.

Why are neural networks still far from the human brain?

The most important difference that fundamentally changes the principle and efficiency of the system is the different signal transmission in artificial neural networks and in the biological network of neurons. The fact is that in the ANN, neurons transmit values that are real values, that is, numbers. In the human brain, impulses with a fixed amplitude are transmitted, and these impulses are almost instantaneous. Hence, there are a number of advantages to the human network of neurons.

First, communication lines in the brain are much more efficient and economical than in ANNs. Secondly, the impulse circuit ensures the simplicity of the technology implementation: it is enough to use analog circuits instead of complex computational mechanisms. Ultimately, impulse networks are protected from acoustic interference. Valid numbers are prone to noise, which increases the likelihood of errors.

Outcome

Of course, in the last decade, there has been a real boom in the development of neural networks. This is primarily due to the fact that the learning process of the ANN has become much faster and easier. Also, the so-called "pre-trained" neural networks began to be actively developed, which can significantly speed up the process of technology implementation. And if it is too early to say whether someday neural networks will be able to fully reproduce the capabilities of the human brain, the likelihood that in the next decade ANNs will be able to replace humans in a quarter of existing professions is becoming more and more true.

For those who want to know more

The Big Neural War: What Google Is Really Up to
How cognitive computers can change our future