How to set up smartphones and PCs. Informational portal
  • home
  • Windows 7, XP
  • Big data (Big Data). Big Data - what are big data systems? Development of Big Data technologies

Big data (Big Data). Big Data - what are big data systems? Development of Big Data technologies

big data- this is not only the data itself, but also the technologies for their processing and use, methods for finding the necessary information in large arrays. The problem of big data is still open and vital for any systems that have been accumulating a wide variety of information for decades.

This term is associated with the expression "Volume, Velocity, Variety"– the principles on which work with big data is built. It's directly amount of information, its processing speed And variety of information stored in an array. Recently, one more has been added to the three basic principles - value, which means value of information. That is, it must be useful and necessary in theoretical or practical terms, which would justify the costs of its storage and processing.

Social networks are an example of a typical source of big data - each profile or public page is one small drop in an unstructured ocean of information. Moreover, regardless of the amount of information stored in a particular profile, interaction with each of the users should be as fast as possible.

Big data is constantly accumulating in almost every area of ​​human life. This includes any industry related to either human interactions or computing. These are social media, and medicine, and the banking sector, as well as device systems that receive numerous results of daily calculations. For example, astronomical observations, meteorological information and information from Earth sounding devices.

Information from various tracking systems in real time is also sent to the servers of a particular company. Television and radio broadcasting, call bases of mobile operators - the interaction of each individual person with them is minimal, but in the aggregate, all this information becomes big data.

Big data technologies have become integral to R&D and commerce. Moreover, they are beginning to capture the sphere of public administration - and everywhere the introduction of more and more efficient systems for storing and manipulating information is required.

The term “big data” first appeared in the press in 2008, when Nature editor Clifford Lynch published an article on how to advance the future of science with the help of big data technologies. Until 2009, this term was considered only from the point of view of scientific analysis, but after the release of several more articles, the press began to widely use the concept of Big Data - and continues to use it at the present time.

In 2010, the first attempts to solve the growing problem of big data began to appear. Software products were released, the action of which was aimed at minimizing the risks when using huge information arrays.

By 2011, large companies such as Microsoft, Oracle, EMC and IBM became interested in big data - they were the first to use Big data in their development strategies, and quite successfully.

Universities began to study big data as a separate subject already in 2013 - now not only data sciences, but also engineering, together with computing subjects, deal with problems in this area.

The main methods of data analysis and processing include the following:

  1. Class methods or deep analysis (Data Mining).

These methods are quite numerous, but they are united by one thing: the mathematical tools used in conjunction with achievements in the field of information technology.

  1. Crowdsourcing.

This technique allows you to obtain data simultaneously from several sources, and the number of the latter is practically unlimited.

  1. A/B testing.

From the entire amount of data, a control set of elements is selected, which is compared in turn with other similar sets, where one of the elements has been changed. Conducting such tests helps to determine which parameter fluctuations have the greatest effect on the control population. Thanks to the volumes of Big Data, it is possible to carry out a huge number of iterations, with each of them approaching the most reliable result.

  1. Predictive analytics.

Specialists in this field try to predict and plan in advance how the controlled object will behave in order to make the most advantageous decision in this situation.

  1. Machine learning (artificial intelligence).

It is based on an empirical analysis of information and the subsequent construction of self-learning algorithms for systems.

  1. Network analysis.

The most common method for the study of social networks - after receiving statistical data, the nodes created in the grid are analyzed, that is, the interactions between individual users and their communities.

In 2017, when big data is no longer something new and unknown, its importance has not only not decreased, but even increased. Now experts are betting that the analysis of large amounts of data will become available not only for giant organizations, but also for small and medium-sized businesses. This approach is planned to be implemented using the following components:

  • Cloud storage.

Data storage and processing are becoming faster and more economical - compared to the costs of maintaining your own data center and the possible expansion of staff, renting a cloud seems to be a much cheaper alternative.

  • Using Dark Data.

The so-called "dark data" is all non-digitized information about a company that does not play a key role in its direct use, but may serve as a reason for switching to a new information storage format.

  • Artificial Intelligence and Deep Learning.

Machine intelligence learning technology, which mimics the structure and operation of the human brain, is the best suited for processing a large amount of constantly changing information. In this case, the machine will do everything that a person should do, but the probability of error is greatly reduced.

  • blockchain.

This technology allows you to speed up and simplify numerous Internet transactions, including international ones. Another advantage of Blockchain is that it reduces transaction costs.

  • Self-service and price reduction.

In 2017, it is planned to introduce "self-service platforms" - these are free platforms where representatives of small and medium-sized businesses will be able to independently evaluate the data they store and systematize it.

All marketing strategies are somehow based on the manipulation of information and the analysis of existing data. That is why the use of big data can predict and make it possible to adjust the further development of the company.

For example, an RTB auction created on the basis of big data allows you to use advertising more efficiently - a certain product will be shown only to the group of users who are interested in purchasing it.

What is the benefit of using big data technologies in marketing and business?

  1. With their help, you can create new projects much faster, which are likely to become popular among buyers.
  2. They help to correlate customer requirements with an existing or projected service and thus adjust them.
  3. Big data methods make it possible to assess the degree of current satisfaction of all users and each one individually.
  4. Increasing customer loyalty is ensured through big data processing methods.
  5. Attracting the target audience on the Internet is becoming easier due to the ability to control huge amounts of data.

For example, one of the most popular services for predicting the likely popularity of a particular product is Google.trends. It is widely used by marketers and analysts, allowing them to get statistics on the use of a given product in the past and forecast for the next season. This allows company executives to more effectively distribute the advertising budget, determine in which area it is best to invest money.

Examples of using Big Data

The active introduction of Big Data technologies to the market and into modern life began just after they began to be used by world-famous companies that have customers in almost every corner of the globe.

These are such social giants as Facebook and Google, IBM., As well as financial structures like Master Card, VISA and Bank of America.

For example, IBM is applying big data techniques to cash transactions. With their help, 15% more fraudulent transactions were detected, which increased the amount of protected funds by 60%. The problems with false positives of the system were also solved - their number was reduced by more than half.

VISA similarly used Big Data, tracking fraudulent attempts to perform a particular operation. Thanks to this, they annually save more than 2 billion US dollars from leakage.

The German Ministry of Labor has managed to cut costs by 10 billion euros by implementing a big data system in the work of issuing unemployment benefits. At the same time, it was revealed that a fifth of citizens receive these benefits without justification.

Big Data has not bypassed the gaming industry either. Thus, the developers of World of Tanks conducted a study of information about all players and compared the available indicators of their activity. This helped to predict possible future churn of players - based on the assumptions made, representatives of the organization were able to interact with users more effectively.

Notable organizations using big data also include HSBC, Nasdaq, Coca-Cola, Starbucks, and AT&T.

The biggest problem with big data is the cost of processing it. This can include both expensive equipment and the cost of wages for qualified specialists capable of servicing huge amounts of information. Obviously, the equipment will have to be regularly updated so that it does not lose its minimum performance as the amount of data increases.

The second problem is again related to the large amount of information that needs to be processed. If, for example, a study gives not 2-3, but a large number of results, it is very difficult to remain objective and select from the general data stream only those that will have a real impact on the state of a phenomenon.

Big Data privacy issue. With most customer service services moving to online data usage, it's very easy to become the next target for cybercriminals. Even simply storing personal information without making any online transactions can be fraught with undesirable consequences for cloud storage customers.

The problem of information loss. Precautions require not to be limited to a simple one-time backup of data, but to make at least 2-3 backup copies of the storage. However, as the volume increases, the complexity of redundancy increases - and IT specialists are trying to find the best solution to this problem.

The market of big data technologies in Russia and the world

As of 2014, 40% of the big data market is made up of services. Slightly inferior (38%) to this indicator is the revenue from the use of Big Data in computer equipment. The remaining 22% is in software.

The most useful products in the global segment for solving Big Data problems, according to statistics, are In-memory and NoSQL analytical platforms. 15 and 12 percent of the market, respectively, are occupied by Log-file analytical software and Columnar platforms. But Hadoop / MapReduce in practice cope with the problems of big data is not very effective.

The results of the implementation of big data technologies:

  • improving the quality of customer service;
  • optimization of integration in the supply chain;
  • organization planning optimization;
  • acceleration of interaction with clients;
  • improving the efficiency of processing customer requests;
  • reduced service costs;
  • optimization of processing of client requests.

Best Big Data Books



Suitable for the initial study of big data processing technologies - it easily and clearly brings you up to date. It makes it clear how the abundance of information has affected everyday life and all its areas: science, business, medicine, etc. Contains numerous illustrations, so it is perceived without much effort.

"Introduction to Data Mining" by Pang-Ning Tan, Michael Steinbach and Vipin Kumar

Also a useful book for beginners on Big Data, which explains how to work with big data in a “from simple to complex” manner. It covers many important points at the initial stage: preparation for processing, visualization, OLAP, as well as some methods of data analysis and classification.

A practical guide to using and working with big data using the Python programming language. Suitable for both engineering students and professionals who want to deepen their knowledge.

"Hadoop for Dummies", Dirk Derus, Paul S. Zikopoulos, Roman B. Melnik

Hadoop is a project designed specifically to work with distributed programs that organize the execution of actions on thousands of nodes at the same time. Acquaintance with it will help to understand in more detail the practical application of big data.

The constant acceleration of data growth is an integral part of today's realities. Social networks, mobile devices, data from measuring devices, business information are just a few of the types of sources that can generate huge amounts of data.

Currently, the term Big Data (Big data) has become quite common. Far from everyone is still aware of how quickly and deeply technologies for processing large amounts of data are changing the most diverse aspects of society. Changes are taking place in various areas, giving rise to new problems and challenges, including in the field of information security, where such important aspects as confidentiality, integrity, availability, etc. should be in the foreground.

Unfortunately, many modern companies resort to Big Data technology without creating the proper infrastructure for this, which could ensure reliable storage of the huge amounts of data that they collect and store. On the other hand, blockchain technology is currently rapidly developing, which is designed to solve this and many other problems.

What is Big Data?

In fact, the definition of the term lies on the surface: "big data" means the management of very large amounts of data, as well as their analysis. If you look more broadly, then this is information that cannot be processed by classical methods due to its large volumes.

The term Big Data itself (big data) appeared relatively recently. According to the Google Trends service, the active growth in the popularity of the term falls on the end of 2011:

In 2010, the first products and solutions directly related to the processing of big data began to appear. By 2011, most of the largest IT companies, including IBM, Oracle, Microsoft and Hewlett-Packard, are actively using the term Big Data in their business strategies. Gradually, information technology market analysts begin active research on this concept.

Currently, this term has gained considerable popularity and is actively used in a variety of fields. However, it cannot be said with certainty that Big Data is some kind of fundamentally new phenomenon - on the contrary, large data sources have existed for many years. In marketing, they can be databases of customer purchases, credit histories, lifestyles, and more. Over the years, analysts have used this data to help companies predict future customer needs, assess risk, shape consumer preferences, and more.

Currently, the situation has changed in two aspects:

— More sophisticated tools and methods have emerged to analyze and compare different datasets;
— Analysis tools have been complemented by many new sources of data, driven by widespread digitization, as well as new methods of collecting and measuring data.

Researchers predict that Big Data technologies will be most actively used in manufacturing, healthcare, trade, public administration and in other very diverse areas and industries.

Big Data is not a specific array of data, but a set of methods for processing them. The defining characteristic for big data is not only their volume, but also other categories that characterize the labor-intensive processes of data processing and analysis.

The initial data for processing can be, for example:

— Internet user behavior logs;
— Internet of things;
- social media;
— meteorological data;
— digitized books of the largest libraries;
– GPS signals from vehicles;
— information about transactions of bank customers;
— data on the location of subscribers of mobile networks;
— information about purchases in large retail chains, etc.

Over time, the amount of data and the number of their sources is constantly growing, and against this background, new methods of information processing appear and existing methods of information processing are improved.

Basic principles of Big Data:

- Horizontal scalability - data arrays can be huge and this means that the big data processing system must dynamically expand as their volumes increase.
- Fault tolerance - even if some pieces of equipment fail, the entire system must remain operational.
— Data locality. In large distributed systems, data is usually distributed over a significant number of machines. However, whenever possible and in order to save resources, data is often processed on the same server as it is stored.

For the stable operation of all three principles and, accordingly, the high efficiency of storing and processing big data, new breakthrough technologies are needed, such as, for example, blockchain.

What is big data for?

The scope of Big Data is constantly expanding:

— Big data can be used in medicine. Thus, it is possible to establish a diagnosis for a patient not only based on the data of the analysis of the medical history, but also taking into account the experience of other doctors, information about the ecological situation of the patient's area of ​​residence, and many other factors.
— Big Data technologies can be used to organize the movement of unmanned vehicles.
— By processing large amounts of data, it is possible to recognize faces in photographic and video materials.
- Big Data technologies can be used by retailers - trading companies can actively use data arrays from social networks to effectively set up their advertising campaigns, which can be maximally focused on a particular consumer segment.
— This technology is actively used in the organization of election campaigns, including for the analysis of political preferences in society.
— The use of Big Data technologies is relevant for income assurance (RA) class solutions, which include tools for detecting inconsistencies and in-depth data analysis that allow timely identification of probable losses or distortions of information that can lead to a decrease in financial results.
— Telecommunication providers can aggregate big data, including geolocation data; in turn, this information may be of commercial interest to advertising agencies, which may use it to display targeted and local advertising, as well as to retailers and banks.
“Big data can play an important role in deciding whether to open a retail outlet in a particular location based on data on the presence of a powerful targeted flow of people.

Thus, the most obvious practical application of Big Data technology lies in the field of marketing. Thanks to the development of the Internet and the proliferation of all kinds of communication devices, behavioral data (such as the number of calls, shopping habits and purchases) are becoming available in real time.

Big data technologies can also be effectively used in finance, sociological research and many other areas. Experts argue that all these possibilities of using big data are only the visible part of the iceberg, since these technologies are used in intelligence and counterintelligence, in military affairs, as well as in everything that is commonly called information warfare, to a much greater extent.

In general terms, the sequence of working with Big Data consists of collecting data, structuring the information received using reports and dashboards, and then formulating recommendations for action.

Let us briefly consider the possibilities of using Big Data technologies in marketing. As you know, for a marketer, information is the main tool for forecasting and strategizing. Big data analysis has long been successfully used to determine the target audience, interests, demand and activity of consumers. Big data analysis, in particular, makes it possible to display advertising (based on the RTB auction model - Real Time Bidding) only to those consumers who are interested in a product or service.

The use of Big Data in marketing allows businessmen to:

- better recognize your consumers, attract a similar audience on the Internet;
- evaluate the degree of customer satisfaction;
— to understand whether the proposed service meets the expectations and needs;
- find and implement new ways to increase customer confidence;
— create projects that are in demand, etc.

For example, the Google.trends service can tell a marketer a forecast of seasonal demand activity for a particular product, fluctuations, and geography of clicks. If you compare this information with the statistics collected by the corresponding plugin on your own site, you can make a plan for the distribution of the advertising budget, indicating the month, region, and other parameters.

According to many researchers, it is in the segmentation and use of Big Data that the success of the Trump campaign lies. The team of the future US president was able to correctly divide the audience, understand its desires and show exactly the message that voters want to see and hear. So, according to Irina Belysheva from the Data-Centric Alliance, Trump's victory was largely due to a non-standard approach to Internet marketing, which was based on Big Data, psycho-behavioral analysis and personalized advertising.

Trump's political technologists and marketers used a specially developed mathematical model, which made it possible to deeply analyze the data of all US voters and systematize them, making ultra-precise targeting not only by geographical features, but also by the intentions, interests of voters, their psychotype, behavioral characteristics, etc. After To this end, marketers have organized personalized communication with each of the groups of citizens based on their needs, moods, political views, psychological characteristics, and even skin color, using their own message for almost every individual voter.

As for Hillary Clinton, she used “time-tested” methods based on sociological data and standard marketing in her campaign, dividing the electorate only into formally homogeneous groups (men, women, African Americans, Hispanics, poor, rich, etc.) .

As a result, the winner was the one who appreciated the potential of new technologies and methods of analysis. Notably, Hillary Clinton's campaign spending was twice that of her opponent:

Data: Pew Research

The main problems of using Big Data

In addition to the high cost, one of the main factors hindering the introduction of Big Data in various areas is the problem of choosing the data to be processed: that is, determining which data needs to be extracted, stored and analyzed, and which ones should not be taken into account.

Another problem of Big Data is ethical. In other words, a natural question arises: can such data collection (especially without the knowledge of the user) be considered a violation of privacy boundaries?

It's no secret that the information stored in Google and Yandex search engines allows IT giants to constantly improve their services, make them user-friendly and create new interactive applications. To do this, search engines collect user data about user activity on the Internet, IP addresses, geolocation data, interests and online purchases, personal data, email messages, etc. All this allows displaying contextual advertising in accordance with user behavior on the Internet. At the same time, users' consent is usually not asked for this, and the choice of what information about themselves to provide is not given. That is, by default, everything is collected in Big Data, which will then be stored on the sites' data servers.

From this follows the next important issue regarding the security of storage and use of data. For example, is an analytics platform that consumers automatically share their data with secure? In addition, many business representatives note a shortage of highly qualified analysts and marketers who are able to effectively operate large amounts of data and solve specific business problems with their help.

Despite all the difficulties with the implementation of Big Data, the business intends to increase investments in this area. According to a Gartner study, the leaders of industries investing in Big Data are media, retail, telecom, banking and service companies.

Prospects for interaction between blockchain technologies and Big Data

Integration with Big Data has a synergistic effect and opens up a wide range of new opportunities for businesses, including allowing:

— get access to detailed information about consumer preferences, on the basis of which you can build detailed analytical profiles for specific suppliers, products and product components;
- integrate detailed data on transactions and statistics on the consumption of certain groups of goods by various categories of users;
- obtain detailed analytical data on supply and consumption chains, control product losses during transportation (for example, weight loss due to shrinkage and evaporation of certain types of goods);
– counteract counterfeit products, increase the effectiveness of the fight against money laundering and fraud, etc.

Access to detailed data on the use and consumption of goods will largely unlock the potential of Big Data technology for optimizing key business processes, reduce regulatory risks, and open up new opportunities for monetization and creating products that will best meet current consumer preferences.

As you know, representatives of the largest financial institutions are already showing significant interest in blockchain technology, including, etc. According to Oliver Bussmann, IT manager of the Swiss financial holding UBS, blockchain technology can “reduce transaction processing time from several days to several minutes” .

The potential for analysis from the blockchain using Big Data technology is huge. Distributed registry technology ensures the integrity of information, as well as reliable and transparent storage of the entire transaction history. Big Data, in turn, provides new tools for effective analysis, forecasting, economic modeling and, accordingly, opens up new opportunities for making more informed management decisions.

The tandem of blockchain and Big Data can be successfully used in healthcare. As you know, imperfect and incomplete data on the health of the patient at times increase the risk of making an incorrect diagnosis and incorrectly prescribed treatment. Critical data about the health of clients of medical institutions should be as secure as possible, have the properties of immutability, be verifiable and not be subject to any manipulation.

The information in the blockchain meets all of the above requirements and can serve as high-quality and reliable source data for in-depth analysis using new Big Data technologies. In addition, using the blockchain, medical institutions could exchange reliable data with insurance companies, justice authorities, employers, academic institutions and other organizations that need medical information.

Big Data and information security

In a broad sense, information security is the protection of information and supporting infrastructure from accidental or intentional negative impacts of a natural or artificial nature.

In the field of information security, Big Data faces the following challenges:

— Problems of data protection and ensuring their integrity;
— the risk of outside interference and leakage of confidential information;
— improper storage of confidential information;
- the risk of information loss, for example, due to someone's malicious actions;
— the risk of misuse of personal data by third parties, etc.

One of the main problems of big data that the blockchain is designed to solve lies in the field of information security. Ensuring compliance with all its basic principles, distributed ledger technology can guarantee the integrity and reliability of data, and due to the absence of a single point of failure, blockchain makes information systems stable. Distributed ledger technology can help solve the problem of trust in data, as well as provide the possibility of a universal exchange of data.

Information is a valuable asset, which means that the main aspects of information security should be at the forefront. In order to survive in the competition, companies must keep up with the times, which means that they cannot ignore the potential opportunities and advantages that blockchain technology and Big Data tools contain.

Only the lazy one does not talk about Big data, but he hardly understands what it is and how it works. Let's start with the simplest - terminology. Speaking in Russian, Big data is a variety of tools, approaches and methods for processing both structured and unstructured data in order to use them for specific tasks and purposes.

Unstructured data is information that does not have a predetermined structure or is not organized in a particular order.

The term "big data" was coined by Nature editor Clifford Lynch back in 2008 in a special issue on the explosive growth of the world's information volumes. Although, of course, big data itself existed before. According to experts, the majority of data flows over 100 GB per day belong to the Big data category.

Read also:

Today, this simple term hides only two words - data storage and processing.

Big data - in simple words

In the modern world, Big data is a socio-economic phenomenon, which is associated with the fact that new technological opportunities have appeared for analyzing a huge amount of data.

Read also:

For ease of understanding, imagine a supermarket in which all the goods are not in the order you are used to. Bread next to fruit, tomato paste next to a frozen pizza, lighter fluid next to a rack of tampons that has avocados, tofu, or shiitake mushrooms, among others. Big data puts everything in its place and helps you find nut milk, find out the cost and expiration date, and also who, besides you, buys such milk and how it is better than cow's milk.

Kenneth Cookier: Big data is better data

Big data technology

Huge amounts of data are processed so that a person can get specific and necessary results for their further effective application.

Read also:

In fact, Big data is a problem solver and an alternative to traditional data management systems.

Techniques and methods of analysis applicable to Big data according to McKinsey:

  • crowdsourcing;

    Blending and data integration;

    Machine learning;

    Artificial neural networks;

    Pattern recognition;

    Predictive analytics;

    simulation modeling;

    Spatial analysis;

    Statistical analysis;

  • Visualization of analytical data.

Horizontal scalability that enables data processing is the basic principle of big data processing. Data is distributed to computing nodes, and processing occurs without performance degradation. McKinsey also included relational management systems and Business Intelligence in the context of applicability.

Technology:

  • NoSQL;
  • MapReduce;
  • Hadoop;
  • Hardware solutions.

Read also:

For big data, there are traditional defining characteristics developed by the Meta Group back in 2001, which are called “ Three V»:

  1. Volume- the value of the physical volume.
  2. Velocity- growth rate and the need for fast data processing to obtain results.
  3. Variety- the ability to simultaneously process different types of data.

Big data: application and opportunities

The volumes of heterogeneous and rapidly incoming digital information cannot be processed by traditional tools. The analysis of the data itself allows you to see certain and imperceptible patterns that a person cannot see. This allows us to optimize all areas of our lives - from public administration to manufacturing and telecommunications.

For example, some companies a few years ago protected their customers from fraud, and taking care of the client's money is taking care of your own money.

Susan Atliger: What about big data?

Solutions based on Big data: Sberbank, Beeline and other companies

Beeline has a huge amount of data about subscribers, which they use not only to work with them, but also to create analytical products, such as external consulting or IPTV analytics. Beeline segmented the database and protected clients from money fraud and viruses by using HDFS and Apache Spark for storage, and Rapidminer and Python for data processing.

Read also:

Or remember Sberbank with their old case called AS SAFI. This is a system that analyzes photos to identify bank customers and prevent fraud. The system was introduced back in 2014, the system is based on comparing photos from the database that get there from webcams on racks thanks to computer vision. The basis of the system is a biometric platform. Thanks to this, cases of fraud decreased by 10 times.

Big data in the world

By 2020, according to forecasts, humanity will form 40-44 zettabytes of information. And by 2025 it will grow 10 times, according to The Data Age 2025 report, which was prepared by IDC analysts. The report notes that most of the data will be generated by businesses themselves, not ordinary consumers.

Analysts of the study believe that data will become a vital asset, and security - a critical foundation in life. Also, the authors of the work are confident that the technology will change the economic landscape, and the average user will communicate with connected devices about 4800 times a day.

Big data market in Russia

Typically, big data comes from three sources:

  • Internet (social networks, forums, blogs, media and other sites);
  • Corporate archives of documents;
  • Indications of sensors, instruments and other devices.

Big data in banks

In addition to the system described above, in the strategy of Sberbank for 2014-2018. talks about the importance of analyzing super-data sets for quality customer service, risk management and cost optimization. The bank now uses Big Data to manage risks, fight fraud, segment and assess customer creditworthiness, manage personnel, predict queues at branches, calculate bonuses for employees, and other tasks.

VTB24 uses big data to segment and manage customer churn, generate financial statements, and analyze reviews in social networks and forums. To do this, he uses Teradata, SAS Visual Analytics, and SAS Marketing Optimizer solutions.

"Big Data"- a topic that is actively discussed by technology companies. Some of them have become disillusioned with big data, while others, on the contrary, use it for business as much as possible... . We hope the information will be interesting and useful.

WHAT IS BIG DATA?

Key Features
Big Data is currently one of the key drivers of information technology development. This direction, relatively new for Russian business, has become widespread in Western countries. This is due to the fact that in the era of information technology, especially after the boom of social networks, a significant amount of information began to accumulate for each Internet user, which ultimately gave rise to the direction of Big Data.

The term "Big Data" causes a lot of controversy, many believe that it means only the amount of accumulated information, but do not forget about the technical side, this area includes storage technologies, computing, and services.

It should be noted that this area includes the processing of a large amount of information, which is difficult to process using traditional methods*.

Below is a comparison table of traditional and Big Data base.

The sphere of Big Data is characterized by the following features:
Volume - the volume, the accumulated database is a large amount of information that is laborious to process and store in traditional ways, they require a new approach and improved tools.
Velocity - speed, this sign indicates both the increasing speed of data accumulation (90% of information was collected over the past 2 years) and the speed of data processing; recently, real-time data processing technologies have become more in demand.
Variety – variety, i.e. the possibility of simultaneous processing of structured and unstructured information of different formats. The main difference between structured information is that it can be classified. An example of such information is information about client transactions.
Unstructured information includes video, audio files, free text, information coming from social networks. To date, 80% of information is included in the group of unstructured. This information needs complex analysis to make it useful for further processing.
Veracity – Reliability of data, users began to attach importance to the reliability of available data. So, Internet companies have a problem in separating the actions carried out by the robot and the person on the company's website, which ultimately leads to the difficulty of data analysis.
value - the value of the accumulated information. Big Data should be useful to the company and bring some value to it. For example, help in improving business processes, reporting or cost optimization.

If the above 5 conditions are met, the accumulated volumes of data can be classified as large.

Applications of Big Data

The scope of Big Data technologies is extensive. So, with the help of Big Data, you can learn about customer preferences, the effectiveness of marketing campaigns, or conduct risk analysis. Below are the results of an IBM Institute survey on the directions of using Big Data in companies.

As can be seen from the diagram, most companies use Big Data in the field of customer service, the second most popular direction is operational efficiency, in the field of risk management Big Data is less common at the moment.

It should also be noted that Big Data is one of the fastest growing areas of information technology, according to statistics, the total amount of received and stored data doubles every 1.2 years.
Between 2012 and 2014, the amount of data transmitted monthly by mobile networks increased by 81%. Cisco estimates that in 2014 the volume of mobile traffic was 2.5 exabytes (a unit of measurement of the amount of information equal to 10 ^ 18 standard bytes) per month, and in 2019 it will be equal to 24.3 exabytes.
Thus, Big Data is already an established field of technology, even despite its relatively young age, which has become widespread in many areas of business and plays an important role in the development of companies.

Big Data Technologies
Technologies used to collect and process Big Data can be divided into 3 groups:
  • Software;
  • Equipment;
  • Service.

The most common data processing (PD) approaches include:
SQL - a structured query language that allows you to work with databases. Using SQL, you can create and modify data, and the data array is managed by the appropriate database management system.
NoSQL - the term stands for Not Only SQL (not only SQL). It includes a number of approaches aimed at the implementation of the database, which differ from the models used in traditional, relational DBMS. They are convenient to use with a constantly changing data structure. For example, to collect and store information in social networks.
MapReduce – calculation distribution model. Used for parallel computing on very large datasets (petabytes* or more). In the programming interface, data is not transferred to the program for processing, but the program is transferred to the data. Therefore, the query is a separate program. The principle of operation is to sequentially process data with two Map and Reduce methods. Map selects preliminary data, Reduce aggregates them.
Hadoop - used to implement search and contextual mechanisms for highly loaded sites - Facebook, eBay, Amazon, etc. A distinctive feature is that the system is protected from failure of any of the cluster nodes, since each block has at least one copy of the data on the other node.
SAP HANA is a high-performance NewSQL platform for data storage and processing. Provides high speed request processing. Another differentiator is that SAP HANA simplifies the system landscape by reducing the cost of supporting analytical systems.

Technological equipment includes:

  • servers;
  • infrastructure equipment.
Servers include data stores.
Infrastructure equipment includes platform acceleration tools, uninterruptible power supplies, server console sets, etc.

Service.
Services include database system architecture, infrastructure development and optimization, and data storage security.

Software, hardware, and services combine to form end-to-end platforms for data storage and analysis. Companies such as Microsoft, HP, EMC offer services for the development, deployment and management of Big Data solutions.

Application in industries
Big Data has become widespread in many business sectors. They are used in healthcare, telecommunications, trade, logistics, financial companies, as well as in public administration.
Below are some examples of Big Data applications in some of the industries.

Retail
The databases of retail stores can accumulate a lot of information about customers, inventory management system, supply of marketable products. This information can be useful in all areas of store activity.

So, with the help of the accumulated information, you can manage the supply of goods, their storage and sale. Based on the accumulated information, it is possible to predict the demand and supply of goods. Also, the data processing and analysis system can solve other problems of the retailer, for example, optimize costs or prepare reports.

Financial services
Big Data makes it possible to analyze a borrower's creditworthiness and is also useful for credit scoring* and underwriting**. The introduction of Big Data technologies will reduce the time for consideration of loan applications. With the help of Big Data, it is possible to analyze the operations of a particular client and offer banking services that are suitable for him.

Telecom
In the telecommunications industry, Big Data is widely used by mobile operators.
Mobile operators, along with financial institutions, have one of the largest databases, which allows them to carry out the most in-depth analysis of the accumulated information.
The main goal of data analysis is to retain existing customers and attract new ones. To do this, companies segment customers, analyze their traffic, and determine the social affiliation of the subscriber.

In addition to using Big Data for marketing purposes, technology is used to prevent fraudulent financial transactions.

Mining and oil industry
Big Data is used both in the extraction of minerals, and in their processing and marketing. Based on the information received, enterprises can draw conclusions about the efficiency of field development, track the overhaul schedule and equipment condition, and forecast demand for products and prices.

According to a survey by Tech Pro Research, Big Data is most widespread in the telecommunications industry, as well as in engineering, IT, financial and government enterprises. According to the results of this survey, Big Data is less popular in education and healthcare. The survey results are presented below:

Examples of using Big Data in companies
Today, Big Data is being actively implemented in foreign companies. Companies such as Nasdaq, Facebook, Google, IBM, VISA, Master Card, Bank of America, HSBC, AT&T, Coca Cola, Starbucks and Netflix are already using Big Data resources.

The areas of application of the processed information are diverse and vary depending on the industry and the tasks to be performed.
Next, examples of the application of Big Data technologies in practice will be presented.

HSBC uses Big Data technologies to counter fraudulent transactions with plastic cards. With the help of Big Data, the company increased the efficiency of the security service by 3 times, and the recognition of fraudulent incidents by 10 times. The economic effect from the introduction of these technologies exceeded 10 million US dollars.

Antifraud* VISA allows you to automatically calculate transactions of a fraudulent nature, the system currently helps prevent fraudulent payments in the amount of 2 billion US dollars annually.

Supercomputer Watson company IBM analyzes in real time the flow of data on money transactions. According to IBM, Watson increased the number of identified fraudulent transactions by 15%, reduced false positives by 50% and increased the amount of funds protected from transactions of this nature by 60%.

Procter & Gamble with the help of Big Data, they design new products and create global marketing campaigns. P&G has created dedicated Business Spheres offices where you can view real-time information.
Thus, the company's management has the opportunity to instantly test hypotheses and conduct experiments. P&G believe that Big Data helps in predicting the company's performance.

Retailer of office supplies officemax with the help of Big Data technologies, they analyze the behavior of customers. Big Data analysis allowed to increase B2B revenue by 13%, reduce costs by $400,000 per year.

According to Caterpillar , its distributors are missing out on $9 billion to $18 billion in revenue annually just because they don't implement Big Data technology. Big Data would allow customers to manage their fleet more efficiently by analyzing information from sensors installed on machines.

To date, it is already possible to analyze the state of key components, their degree of wear, manage fuel and maintenance costs.

Luxottica group is a manufacturer of sports eyewear with brands such as Ray-Ban, Persol and Oakley. The company uses Big Data technologies to analyze the behavior of potential customers and "smart" SMS marketing. As a result, Big Data Luxottica group identified more than 100 million of the most valuable customers and increased the effectiveness of the marketing campaign by 10%.

With the help of Yandex Data Factory, game developers World of Tanks analyze the behavior of the players. Big Data technologies made it possible to analyze the behavior of 100 thousand World of Tanks players using more than 100 parameters (information about purchases, games, experience, etc.). As a result of the analysis, a forecast of user churn was obtained. This information allows you to reduce user care and work with game participants in a targeted manner. The developed model turned out to be 20-30% more efficient than standard gaming industry analysis tools.

German Ministry of Labor uses Big Data to analyze incoming unemployment claims. So, after analyzing the information, it became clear that 20% of benefits were paid undeservedly. With the help of Big Data, the Ministry of Labor has reduced costs by 10 billion euros.

Toronto Children's Hospital implemented the Project Artemis project. This is an information system that collects and analyzes data on babies in real time. The system monitors 1,260 indicators of the state of each child every second. Project Artemis allows you to predict the unstable condition of the child and begin the prevention of diseases in children.

OVERVIEW OF THE GLOBAL BIG DATA MARKET

The current state of the global market
In 2014, Big Data, according to Data Collective, has become one of the priority areas for investing in the venture industry. According to the information portal Computerra, this is due to the fact that developments in this area have begun to bring significant results for their users. Over the past year, the number of companies with implemented projects in the field of big data management has increased by 125%, the market volume has grown by 45% compared to 2013.

Most of the revenue of the Big Data market, according to Wikibon, in 2014 was made up of services, their share was equal to 40% of the total revenue (see the diagram below):

If we consider Big Data for 2014 by subtypes, then the market will look like this:

According to Wikibon, apps and analytics accounted for 36% of Big Data revenue in 2014 from Big Data apps and analytics, 17% from computing hardware, and 15% from storage technology. Least of all revenue was generated by NoSQL technologies, infrastructure equipment and providing a network of companies (corporate networks).

The most popular Big Data technologies are the in-memory platforms of SAP, HANA, Oracle, etc. The results of the T-Systems survey showed that they were chosen by 30% of the surveyed companies. The second most popular were NoSQL platforms (18% of users), companies also used analytical platforms from Splunk and Dell, they were chosen by 15% of companies. The least useful for solving Big Data problems, according to the results of the survey, were Hadoop/MapReduce products.

According to an Accenture survey, in more than 50% of companies using Big Data technologies, Big Data costs range from 21% to 30%.
According to the following Accenture analysis, 76% of companies believe that these costs will increase in 2015, and 24% of companies will not change their budget for Big Data technologies. This suggests that in these companies Big Data has already become an established area of ​​IT, which has become an integral part of the company's development.

The results of the Economist Intelligence Unit survey confirm the positive impact of Big Data implementation. 46% of companies claim that they have improved customer service by more than 10% using Big Data technologies, 33% of companies have optimized inventory and improved the productivity of key assets, 32% of companies have improved planning processes.

Big Data around the world
To date, Big Data technologies are most often implemented in US companies, but now other countries of the world have begun to show interest. In 2014, according to IDC, the countries of Europe, the Middle East, Asia (excluding Japan) and Africa accounted for 45% of the Big Data software, services and equipment market.

Also, according to the CIO survey, companies from the Asia-Pacific region are rapidly mastering new solutions in the field of Big Data analytics, secure storage and cloud technologies. Latin America is in second place in terms of the number of investments in the development of Big Data technologies, ahead of Europe and the USA.
Next, a description and forecasts of the development of the Big Data market in several countries will be presented.

China
The amount of information in China is 909 exabytes, which is equal to 10% of the total amount of information in the world, by 2020 the amount of information will reach 8060 exabytes, and the share of information in global statistics will also increase, in 5 years it will be equal to 18%. The potential growth of China's Big Data has one of the fastest growing dynamics.

Brazil
By the end of 2014, Brazil has accumulated 212 exabytes of information, which is 3% of the global volume. By 2020, the volume of information will grow to 1600 exabytes, which will be 4% of the world's information.

India
According to EMC, the amount of accumulated data in India in 2014 is 326 exabytes, which is 5% of the total amount of information. By 2020, the volume of information will grow to 2800 exabytes, which will be 6% of the world's information.

Japan
The amount of accumulated data in Japan at the end of 2014 is 495 exabytes, which is 8% of the total amount of information. By 2020, the volume of information will grow to 2200 exabytes, but the market share of Japan will decrease and will amount to 5% of the total amount of information in the whole world.
Thus, the volume of the Japanese market will decrease by more than 30%.

Germany
According to EMC, the amount of accumulated data in Germany in 2014 is 230 exabytes, which is 4% of the total amount of information in the world. By 2020, the volume of information will grow to 1100 exabytes and will be 2%.
In the German market, a large share of revenue, according to Experton Group forecasts, will be generated by the services segment, whose share in 2015 will be 54%, and in 2019 will increase to 59%, the share of software and hardware, on the contrary, will decrease.

In general, the market size will grow from 1.345 billion euros in 2015 to 3.198 billion euros in 2019, with an average growth rate of 24%.
Thus, based on the CIO and EMC analytics, we can conclude that the developing countries of the world will become markets for the active development of Big Data technologies in the coming years.

Main Market Trends
According to IDG Enterprise, in 2015 Big Data companies will spend an average of $7.4 million per company, large companies intend to spend approximately $13.8 million, and small and medium companies will spend $1.6 million. .
Most of the investment will be in areas such as data analysis, visualization and data collection.
According to current trends and market demand, investments in 2015 will be used to improve data quality, improve planning and forecasting, and increase data processing speed.
Financial sector companies, according to Bain Company's Insights Analysis, will make significant investments, so in 2015 it is planned to spend 6.4 billion US dollars on Big Data technologies, the average investment growth rate will be 22% until 2020. Internet companies plan to spend $2.8 billion, with an average growth rate of 26% increase in Big Data spending.
During the Economist Intelligence Unit survey, the priority areas for the development of Big Data in 2014 and in the next 3 years were identified, the distribution of answers is as follows:

According to IDC forecasts, market trends are as follows:

  • Over the next 5 years, the cost of cloud-based Big Data solutions will grow 3 times faster than the cost of on-premises solutions. Hybrid storage platforms will become popular.
  • Growth of applications using sophisticated and predictive analytics, including machine learning, will accelerate in 2015, the market for such applications will grow 65% faster than applications that do not use predictive analytics.
  • Media analytics will triple in 2015 and become a key growth driver for the Big Data technology market.
  • The trend to implement solutions for analyzing the constant flow of information that is applicable to the Internet of things will accelerate.
  • By 2018, 50% of users will interact with services based on cognitive computing.
Market Drivers and Limiters
IDC experts identified 3 drivers of the Big Data market in 2015:

According to the Accenture survey, data security issues are now the main barrier to the adoption of Big Data technologies, more than 51% of respondents confirmed that they are concerned about data protection and privacy. 47% of companies reported the impossibility of implementing Big Data due to a limited budget, 41% of companies indicated a lack of qualified personnel as a problem.

Wikibon predicts that the Big Data market will grow to $38.4 billion in 2015, up 36% year-over-year. In the coming years, there will be a decline in growth rates to 10% in 2017. Taking into account these forecasts, the market size in 2020 will be equal to 68.7 billion US dollars.

The distribution of the global Big Data market by business category will look like this:

As you can see from the diagram, most of the market will be occupied by technologies from the field of customer service improvement. Spot marketing will be the second highest priority for companies until 2019, in 2020, according to Heavy Reading, it will give way to solutions to improve operational efficiency.
The segment “improving customer service” will also have the highest growth rate, with an increase of 49% annually.
The market forecast for Big Data subtypes will look like this:

The predominant market share, as can be seen from the diagram, is occupied by professional services, applications with analytics will have the highest growth rate, their share will grow from the current 12% to 18% in 2020 and the volume of this segment will be equal to 12.3 billion US dollars, the share of computing equipment, on the contrary, will fall from 20% to 14% and amount to about 9.3 billion US dollars in 2020, the market for cloud technologies will gradually increase and in 2020 will reach 6.3 billion US dollars, the market share of solutions for data storage, on the contrary, will decrease from 15% in 2014 to 13% in 2020 and in monetary terms will be equal to 8.9 billion US dollars.
According to Bain & Company’s Insights Analysis forecast, the distribution of the Big Data market by industry in 2020 will look like this:

  • The financial industry will spend $6.4 billion on Big Data with an average growth rate of 22% per year;
  • Internet companies to spend $2.8 billion and average cost growth rate of 26% over the next 5 years;
  • The costs of the public sector will be commensurate with the costs of Internet companies, but the growth rate will be lower - 22%;
  • The telecommunications sector will grow at an average growth rate of 40% and reach $1.2 billion in 2020;

Energy companies will invest in these technologies a relatively small amount - 800 million US dollars, but the growth rate will be one of the highest - 54% annually.
Thus, companies in the financial industry will take a large share of the Big Data market in 2020, and energy will be the fastest growing sector.
Following analysts' forecasts, the total market volume will increase in the coming years. The growth of the market will be ensured by the introduction of Big Data technologies in the developing countries of the world, as can be seen from the graph below.

The predicted market size will depend on how developing countries perceive Big Data technologies, whether they will be as popular as in developed countries. In 2014, the developing countries of the world accounted for 40% of the accumulated information. According to EMC's forecast, the current market structure, dominated by developed countries, will change as early as 2017. According to EMC analytics, in 2020 the share of developing countries will be more than 60%.
According to Cisco and EMC, the developing countries of the world will actively work with Big Data, in many respects this will be due to the availability of technologies and the accumulation of sufficient information to the level of Big Data. The world map on the next page will show the growth forecast and growth rate of Big Data by region.

ANALYSIS OF THE RUSSIAN MARKET

Current state of the Russian market

According to the results of a study by CNews Analytics and Oracle, the level of maturity of the Russian Big Data market has increased over the past year. Respondents representing 108 large enterprises from different industries showed a higher degree of awareness of these technologies, as well as an understanding of the potential of such solutions for their business.
As of 2014, according to IDC, Russia has accumulated 155 exabytes of information, which is only 1.8% of the world's data. The volume of information by 2020 will reach 980 exabytes and will occupy 2.2%. Thus, the average growth rate of the volume of information will be 36% per year.
IDC estimates the Russian market at $340 million, of which $100 million is SAP solutions, approximately $240 million is similar solutions from Oracle, IBM, SAS, Microsoft, etc.
The growth rate of the Russian Big Data market is at least 50% per year.
It is predicted that the positive dynamics in this sector of the Russian IT market will continue, even in the context of a general stagnation of the economy. This is due to the fact that businesses continue to demand solutions that improve work efficiency, as well as optimize costs, improve forecasting accuracy and minimize possible company risks.
The main providers of services in the field of Big Data in the Russian market are:
  • Oracle
  • Microsoft
  • cloudera
  • Hortonworks
  • Teradata.
Overview of the market by industry and the experience of using Big Data in companies
According to CNews, only 10% of companies in Russia have started using Big Data technologies, while the share of such companies in the world is about 30%. Readiness for Big Data projects is growing in many sectors of the Russian economy, according to a report from CNews Analytics and Oracle. More than a third of the surveyed companies (37%) have started working with Big Data technologies, among which 20% are already using such solutions, and 17% are starting to experiment with them. The second third of respondents are currently considering such a possibility.

In Russia, Big Data technologies are more popular in the banking sector and telecom, but they are also in demand in the mining industry, energy, retail, logistics companies and the public sector.
Next, examples of the use of Big Data in Russian realities will be considered.

Telecom
Telecom operators have one of the largest databases, which allows them to carry out the most in-depth analysis of the accumulated information.
One of the areas of application of Big Data technology is subscriber loyalty management.
The main goal of data analysis is to retain existing customers and attract new ones. To do this, companies segment customers, analyze their traffic, and determine the social affiliation of the subscriber. In addition to using information for marketing purposes, telecom uses technology to prevent fraudulent financial transactions.
Vimpelcom is one of the brightest examples of this industry. The company uses Big Data to improve the quality of service at the level of each subscriber, reporting, analyzing data for network development, combating spam and personalizing services.

Banks
A significant proportion of Big Data users is occupied by specialists from the financial industry. One of the successful experiments was carried out at the Ural Bank for Reconstruction and Development, where the information base was used to analyze customers, the bank began to offer specialized loan offers, deposits and other services. During the year of using these technologies, the company's retail loan portfolio grew by 55%.
Alfa-Bank analyzes information from social networks, processes loan applications, analyzes the behavior of users of the company's website.
Sberbank has also begun processing a data array to segment customers, prevent fraud, cross-sell and manage risk. In the future, it is planned to improve the service and analyze the actions of customers in real time.
The All-Russian Regional Development Bank analyzes the behavior of plastic card holders. This allows you to identify transactions that are atypical for a particular client, thereby increasing the likelihood of detecting theft of funds from plastic cards.

Retail
In Russia, Big Data technologies have been implemented by both online and offline trading companies. Today, according to CNews Analytics, Big Data is used by 20% of retailers. 75% of retail professionals consider Big Data necessary for developing a competitive strategy for promoting a company. According to Hadoop statistics, after the introduction of Big Data technology, profit in trade organizations grows by 7-10%.
M.Video specialists talk about the improvement of logistics planning after the implementation of SAP HANA, also, as a result of its implementation, the preparation of annual reports was reduced from 10 days to 3, the speed of daily data loading was reduced from 3 hours to 30 minutes.
Wikimart uses these technologies to generate recommendations for site visitors.
One of the first offline stores to introduce Big Data analysis in Russia was Lenta. With the help of Big Data, retail began to study information about customers from cash receipts. The retailer collects information to build behavioral models that enable more informed decision making at the operational and business level.

Oil and gas industry
In this industry, the scope of Big Data is quite wide. Big Data technologies can be applied in the extraction of minerals from the bowels. With their help, you can analyze the mining process itself and the most effective ways to extract it, track the drilling process, analyze the quality of raw materials, as well as the processing and marketing of final products. In Russia, these technologies are already being used by Transneft and Rosneft.

State bodies
In countries such as Germany, Australia, Spain, Japan, Brazil and Pakistan, Big Data technologies are used to solve national problems. These technologies help public authorities more effectively provide services to the population, provide targeted social support.
In Russia, these technologies began to be mastered by such state bodies as the Pension Fund, the Federal Tax Service and the Compulsory Medical Insurance Fund. The potential for implementing projects using Big Data is large; these technologies could help improve the quality of services, and, as a result, the standard of living of the population.

Logistics and transport
Big Data can also be used by transport companies. With the help of Big Data technologies, it is possible to track the fleet of cars, take into account fuel costs, and monitor customer requests.
Russian Railways implemented Big Data technologies together with SAP. These technologies helped to reduce the reporting time by 43.5 times (from 14.5 hours to 20 minutes) and improve the accuracy of cost allocation by 40 times. Also, Big Data was introduced into the processes of planning and tariff regulation. In total, the companies use more than 300 systems based on SAP solutions, 4 data centers are involved, and the number of users is 220,000.

Main market drivers and constraints
Drivers for the development of Big Data technologies in the Russian market are:
  • Increased user interest in the possibilities of Big Data as a way to increase the company's competitiveness;
  • Development of methods for processing media files at the global level;
  • Transfer of servers processing personal information to the territory of Russia, in accordance with the adopted law on the storage and processing of personal data;
  • Implementation of the industry plan for software import substitution. This plan includes state support for domestic software manufacturers, as well as the provision of preferences for domestic IT products when purchasing at public expense.
  • In the new economic situation, when the dollar has almost doubled, there will be a trend towards an increasing use of the services of Russian cloud service providers rather than foreign ones.
  • Creation of technology parks that contribute to the development of the information technology market, including the Big Data market;
  • State program for the introduction of grid systems, which are based on Big Data technologies.

The main barriers to the development of Big Data in the Russian market are:

  • Ensuring the security and confidentiality of data;
  • Lack of qualified personnel;
  • Insufficiency of accumulated information resources up to the level of Big Data in most Russian companies;
  • Difficulties in introducing new technologies into established information systems of companies;
  • The high cost of Big Data technologies, which leads to a limited number of enterprises that have the opportunity to implement these technologies;
  • Political and economic uncertainty, which led to the outflow of capital and the freezing of investment projects in Russia;
  • Rising prices for imported products and a surge in inflation, according to IDC, hinder the development of the entire IT market.
Russian market forecast
As of today, the Russian Big Data market is not as popular as in developed countries. Most Russian companies show interest in it, but do not dare to take advantage of their opportunities.
Examples of large companies that have already benefited from the use of Big Data technologies are increasing awareness of the possibilities of these technologies.
Analysts also have quite optimistic forecasts for the Russian market. IDC believes that the share of the Russian market will increase over the next 5 years, in contrast to the market in Germany and Japan.
By 2020, the volume of Big Data in Russia will grow from the current 1.8% to 2.2% of the global data volume. The amount of information will grow, according to EMC, from the current 155 exabytes to 980 exabytes in 2020.
At the moment, Russia continues to accumulate the volume of information to the level of Big Data.
According to a CNews Analytics survey, 44% of surveyed companies work with data no larger than 100 terabytes*, and only 13% work with volumes above 500 terabytes.

Nevertheless, the Russian market, following global trends, will increase. As of 2014, IDC estimates the market size at $340 million.
The market growth rate for previous years was 50% per year, if it remains at the same level, then in 2018 the market volume will reach 1.7 billion US dollars. The share of the Russian market in the world market will be about 3%, having increased from the current 1.2%.

The most receptive industries to the use of Big Data in Russia include:

  • Retail and banks, for them, first of all, it is important to analyze the customer base, evaluate the effect of marketing campaigns;
  • Telecom - customer base segmentation and traffic monetization;
  • Public sector - reporting, analysis of applications from the public, etc.;
  • Oil companies - monitoring of work and planning of production and marketing;
  • Energy companies - creation of intelligent electric power systems, operational monitoring and forecasting.
In developed countries, Big Data has become widespread in the fields of healthcare, insurance, metallurgy, Internet companies and manufacturing enterprises, most likely in the near future, Russian companies from these areas will also appreciate the effect of Big Data implementation and will adapt these technologies in their industries.
In Russia, as well as in the world, in the near future there will be a trend towards data visualization, analysis of media files and the development of the Internet of things.
Despite the general stagnation of the economy, in the coming years, analysts predict further growth in the Big Data market, primarily due to the fact that the use of Big Data technologies gives its users a competitive advantage in terms of increasing the operational efficiency of the business, attracting an additional flow of customers, minimizing risks and implementation of data forecasting technologies.
Thus, we can conclude that the Big Data segment in Russia is at the formation stage, but the demand for these technologies is increasing every year.

Main results of the market analysis

World market
At the end of 2014, the Big Data market is characterized by the following parameters:
  • the market volume amounted to 28.5 billion US dollars, an increase of 45% compared to the previous year;
  • most of the revenue of the Big Data market was made up of services, their share was equal to 40% of the total revenue;
  • 36% of revenue came from Big Data applications and analytics, 17% from computing hardware and 15% from storage technologies;
  • The in-memory platforms of companies such as SAP, HANA and Oracle are the most popular for solving Big Data problems.
  • the number of companies with implemented projects in the field of Big Data management increased by 125%;
The market forecast for the next years is as follows:
  • in 2015 the market volume will reach 38.4 billion US dollars, in 2020 - 68.7 billion US dollars;
  • the average growth rate will be 16% annually;
  • average company spending on Big Data technologies will be $13.8 million for large companies and $1.6 million for small and medium-sized businesses;
  • technologies will have the greatest prevalence in the areas of customer service and targeted marketing;
  • in 2017, the global market structure will change towards the predominance of user companies from developing countries.
Russian market
The Russian Big Data market is at the stage of formation, the results of 2014 are as follows:
  • the market volume reached 340 million US dollars;
  • the average market growth rate in previous years was 50% annually;
  • the total amount of accumulated information was 155 exabytes;
  • 10% of Russian companies have started using Big Data technologies;
  • Big Data technologies were more popular in the banking sector, telecom, Internet companies and retail.
The forecast for the Russian market for the coming years is as follows:
  • the volume of the Russian market in 2015 will reach 500 million US dollars, and in 2018 - 1.7 billion US dollars;
  • the share of the Russian market in the world market will be about 3% in 2018;
  • the amount of accumulated data in 2020 will be 980 exabytes;
  • data will grow to 2.2% of global data in 2020;
  • technologies of data visualization, analysis of media files and the Internet of things will gain the greatest popularity.
Based on the results of the analysis, we can conclude that the Big Data market is still in its early stages of development, and in the near future we will observe its growth and the expansion of the capabilities of these technologies.

Thank you for taking the time to read this voluminous work, subscribe to our blog - we promise many new interesting publications!

Do you know this famous joke? Big Data is like sex before 18:

  • everyone thinks about it;
  • everyone talks about it;
  • everyone thinks their friends are doing it;
  • almost nobody does it;
  • whoever does it does it badly;
  • everyone thinks it will be better next time;
  • no one takes security measures;
  • anyone is ashamed to admit that he does not know something;
  • if someone succeeds, there is always a lot of noise from it.

But let's be honest, with any hype, the usual curiosity will always go along: what kind of fuss is there and is there something really important there? In short, yes, there is. Details are below. We have selected for you the most amazing and interesting applications of Big Data technologies. This small study of the market on clear examples confronts a simple fact: the future does not come, you do not need to "wait another n years and the magic will become a reality." No, it has already arrived, but it is still invisible to the eye, and therefore the burning of the singularity does not yet burn a certain point in the labor market so much. Go.

1 How Big Data technologies are applied where they originated

Large IT companies are where data science was born, so their inner workings in this area are the most interesting. A Google campaign, the birthplace of the Map Reduce paradigm, whose sole purpose is to educate its programmers in machine learning techniques. And therein lies their competitive advantage: after gaining new knowledge, employees will implement new methods in those Google projects where they constantly work. Imagine how huge the list of areas in which the campaign can make a revolution. One example: neural networks are used.

The corporation is implementing machine learning into all its products. Its advantage is the presence of a large ecosystem, which includes all digital devices used in everyday life. This allows Apple to reach an impossible level: the campaign has more user data than any other. At the same time, the privacy policy is very strict: the corporation has always boasted that it does not use customer data for advertising purposes. Accordingly, user information is encrypted so that Apple lawyers or even the FBI with a warrant cannot read it. You'll find a great overview of Apple's AI developments here.

2 Big Data on 4 wheels

A modern car is an information store: it accumulates all the data about the driver, the environment, connected devices and about itself. Soon, one vehicle that is connected to a network like this one will generate up to 25 GB of data per hour.

Vehicle telematics has been used by automakers for many years, but now a more sophisticated data collection method is being lobbied that makes full use of Big Data. This means that technology can now alert the driver to bad road conditions by automatically activating the anti-lock braking and traction control system.

Other concerns, including BMW, are using Big Data technology, combined with insights from test prototypes, built-in "error memory" systems and customer complaints, to identify weaknesses in a model early in production. Now, instead of manually evaluating the data, which takes months, a state-of-the-art algorithm is applied. Errors and troubleshooting costs are reduced, allowing for faster data analysis workflows at BMW.

According to expert estimates, by 2019 the turnover of the market connected to a single network of cars will reach $130 billion. This is not surprising, given the pace of integration by automakers of technologies that are an integral part of the vehicle.

The use of Big Data helps to make the machine safer and more functional. So, Toyota by embedding Information Communication Modules (DCM) . This tool, used for Big Data, processes and analyzes the data collected by DCM in order to further benefit from it.

3 Application of big data in medicine


The implementation of Big Data technologies in the medical field allows doctors to more thoroughly study the disease and choose an effective course of treatment for a particular case. Thanks to the analysis of information, it becomes easier for health workers to predict relapses and take preventive measures. The result is a more accurate diagnosis and improved treatments.

The new technique made it possible to look at the problems of patients from a different angle, which led to the discovery of previously unknown sources of the problem. For example, some races are genetically more predisposed to heart disease than members of other ethnic groups. Now, when a patient complains of a certain disease, doctors take into account data about members of his race who complained about the same problem. The collection and analysis of data allows you to learn much more about patients: from food preferences and lifestyle to the genetic structure of DNA and metabolites of cells, tissues, organs. For example, the Center for Pediatric Genomic Medicine in Kansas City uses patients and analyzes mutations in the genetic code that cause cancer. An individual approach to each patient, taking into account his DNA, will raise the effectiveness of treatment to a qualitatively new level.

With the understanding of how Big Data is used, the first and very important change in the medical field follows. When a patient is undergoing treatment, a hospital or other healthcare facility can obtain a lot of valuable information about the person. The collected information is used to predict the recurrence of diseases with a certain degree of accuracy. For example, if a patient has had a stroke, doctors study information about the time of cerebrovascular accident, analyze the interim period between previous precedents (if any), paying special attention to stressful situations and heavy physical exertion in the patient's life. Based on this data, hospitals give the patient a clear plan of action to prevent the possibility of a stroke in the future.

Wearable devices also play a role, helping to identify health problems, even if a person does not have obvious symptoms of a particular disease. Instead of assessing the patient's condition through a long course of examinations, the doctor can draw conclusions based on the information collected by a fitness tracker or smart watch.

One of the latest examples is . While the patient was being examined for a new seizure caused by a missed medication, doctors discovered that the man had a much more serious health problem. The problem turned out to be atrial fibrillation. The diagnosis was made due to the fact that the staff of the department got access to the patient's phone, namely to the application associated with his fitness tracker. The data from the application turned out to be a key factor in determining the diagnosis, because at the time of the examination, no cardiac abnormalities were found in the man.

This is just one of the few cases that shows why use big data in the medical field today plays such a significant role.

4 Data analytics is already at the core of retail

Understanding user queries and targeting is one of the largest and most widely publicized areas of application of Big Data tools. Big Data helps analyze customer habits in order to better understand consumer needs in the future. Companies are looking to expand the traditional data set with social media information and browser search history in order to form the most complete customer picture possible. Sometimes large organizations choose to create their own predictive model as a global goal.

For example, the Target chain of stores, using deep data analysis and its own forecasting system, can determine with high accuracy -. Each client is assigned an ID, which in turn is tied to a credit card, name or email. The identifier serves as a kind of shopping cart, where information is stored about everything that a person has ever purchased. Network specialists found that women in the position actively purchase unflavored products before the second trimester of pregnancy, and during the first 20 weeks lean on calcium, zinc and magnesium supplements. Based on the received data, Target sends coupons for children's products to customers. The discounts on goods for children themselves are “diluted” with coupons for other products so that offers to buy a crib or diapers do not look too intrusive.

Even government departments have found a way to use Big Data technologies to optimize election campaigns. Some believe that B. Obama's victory in the US presidential election in 2012 is due to the excellent work of his team of analysts, who processed huge amounts of data in the right way.

5 Big Data on guard of law and order


Over the past few years, law enforcement agencies have figured out how and when to use Big Data. It is a well-known fact that the National Security Agency uses Big Data technologies to prevent terrorist attacks. Other departments are using progressive methodology to prevent smaller crimes.

The Los Angeles Police Department uses . It does what is commonly referred to as proactive law enforcement. Using crime reports over a period of time, the algorithm determines areas where crime is most likely to occur. The system marks such areas on the city map with small red squares and this data is immediately transmitted to patrol cars.

Chicago Cops use big data technologies in a slightly different way. Law enforcement officers from the Windy City also have it, but it is aimed at outlining a "circle of risk" consisting of people who may be a victim or participant in an armed attack. According to The New York Times, this algorithm assigns a vulnerability score to a person based on their criminal history (arrests and participation in shootings, belonging to criminal gangs). The developer of the system says that while the system studies the criminal past of the individual, it does not take into account secondary factors such as race, gender, ethnicity and location of the person.

6 How Big Data technologies help cities develop


Veniam CEO João Barros demonstrates a tracking map of Wi-Fi routers on buses in the city of Porto

Data analysis is also used to improve a number of aspects of the life of cities and countries. For example, knowing exactly how and when to use Big Data technologies can optimize transport flows. For this, the online movement of cars is taken into account, social media and meteorological data are analyzed. Today, a number of cities have taken the lead in using data analytics to combine transport infrastructure with other types of public services into a single whole. This is the concept of a smart city, where buses wait for a late train, and traffic lights are able to predict traffic congestion to minimize traffic jams.

Based on Big Data technologies, the city of Long Beach operates "smart" water meters that are used to curb illegal watering. Previously, they were used to reduce water consumption by private households (the maximum result is a reduction of 80%). Saving fresh water is always a topical issue. Especially when the state is experiencing the worst drought ever recorded.

Representatives of the Department of Transportation of the City of Los Angeles joined the list of those who use Big Data. Based on the data received from traffic camera sensors, the authorities control the operation of traffic lights, which in turn allows traffic to be regulated. The computerized system controls about 4,500,000 traffic lights throughout the city. According to official data, the new algorithm helped reduce congestion by 16%.

7 Engine of progress in marketing and sales


In marketing, Big Data tools allow you to identify which ideas are the most effective to promote at a particular stage of the sales cycle. Data analysis determines how investments can improve customer relationship management, which strategy should be chosen to increase conversion rates, and how to optimize the customer lifecycle. In the cloud business, Big Data algorithms are used to figure out how to minimize the cost of customer acquisition and increase customer lifecycle.

Differentiation of pricing strategies depending on the intra-system level of the client is, perhaps, the main thing for which Big Data is used in the field of marketing. McKinsey found that about 75% of the average firm's revenue comes from basic products, 30% of which are incorrectly priced. A 1% price increase results in an 8.7% increase in operating profit.

The Forrester research team was able to determine that data analytics allows marketers to focus on how to make customer relationships more successful. By exploring the direction of customer development, specialists can assess the level of their loyalty, as well as extend the life cycle in the context of a particular company.

The optimization of sales strategies and the stages of entering new markets using geoanalytics are reflected in the biopharmaceutical industry. According to McKinsey, drug companies spend an average of 20 to 30% of their profits on administration and sales. If enterprises become more active use big data to identify the most cost-effective and fastest growing markets, costs will be cut immediately.

Data analytics is a means for companies to get a complete picture of key aspects of their business. Increasing revenues, reducing costs and reducing working capital are the three tasks that modern business tries to solve with the help of analytical tools.

Finally, 58% of CMOs say that the implementation of Big Data technologies can be traced in search engine optimization (SEO), e-mail and mobile marketing, where data analysis plays the most significant role in shaping marketing programs. And only 4% fewer respondents are confident that Big Data will play a significant role in all marketing strategies for many years to come.

8 Global data analysis

No less curious is that. It is possible that machine learning will ultimately be the only force capable of maintaining a delicate balance. The topic of human influence on global warming still causes a lot of controversy, so only reliable predictive models based on the analysis of large amounts of data can give an accurate answer. Ultimately, reducing emissions will help us all: we will spend less on energy.

Now Big Data is not an abstract concept, which, perhaps, will find its application in a couple of years. This is a fully working set of technologies that can be useful in almost all areas of human activity: from medicine and public order to marketing and sales. The stage of active integration of Big Data into our daily lives has just begun, and who knows what the role of Big Data will be in a few years?

Top Related Articles