How data storage systems work. Purpose of data storage systems (DSS) and their types

01.08.2020 Iron

Is a hardware and software solution for securely storing data and providing fast and reliable access to it.

Hardware implementation in storage systems(Storage) is similar to the implementation of the architecture of a personal computer. Why, then, generally use storage systems in the architecture of an organization's local network, why can't it be provided, to implement storage systems based on a regular PC?

Storage as an additional node of a local network based on a personal computer or even a powerful server have existed for a long time.

The simplest provision of access to data over protocols FTP(file transfer protocol) and SMB(protocol of remote access to network resources) which are supported in all modern operating systems.

Why then did they even appear Storage?

It's simple, the appearance Storage associated with the lag in the development and speed of permanent storage devices (hard magnetic disks) from the central processor and random access memory. The bottleneck in the PC architecture is still considered to be the hard disk, even in spite of the powerful development SATA(serial interface) up to an exchange rate of 600 MB / s ( SATA3), the physical device of the drive is a plate, the data on which must be accessed using readheads, which is very slow. The latest drawbacks are currently solved by drives SSD(not mechanical memory) built on the basis of memory chips. In addition to the high price of SSD they have, in my opinion, at the current moment in time, a lack of reliability. Engineers Storage proposed to replace storage devices into a separate element, and use the RAM of such devices to store frequently changing data using special algorithms, which required the software component of the product. Eventually storage systems run faster than hard drives in servers, and the removal of the storage device (disk subsystem in a separate element) affected reliability and centralization system as a whole.

Reliability ensured the fact of the implementation in a separate device of the disk system, which, working with the software component, performs one function - these are operations input / output and data storage.

Besides the simple principle - one device, one function ensuring reliability. All main nodes: power supplies, controllers data storage systems are duplicated, which, of course, further increases the reliability of the system, but affects the price of the final product.

Removing the disk system into a separate unit allows centralize storage devices... As a rule, without a separate network storage, users' home folders, mail, databases are stored on separate nodes, as a rule, on servers in the network, which is very inconvenient and not reliable. You have to make backups, duplicate data on a backup server in the network, which, in addition to the cost of support and hardware, software, takes up part of the network bandwidth.

This is how it looks:

With a separate storage system:

Depending on the method, connection technology Storage to the information network. Storage subdivided into: DAS, NAS, SAN

DAS (DirectAttachedStorage)- a connection method that is no different from the standard connection of a hard drive, disk arrays (RAID) to a server or PC. Typically, the connection is SAS.

SAS- in fact, the protocol designed to replace SCSI uses a serial interface unlike SCSI, but the commands are the same as in SCSI. SAS has more bandwidth thanks to the link connections on a single interface.

NAS (NetworkAttachedStorage)- the disk system is connected to a common LAN network, the TCP transport protocol is used, the protocols work on top of the model SMB,NFS(remote access to files and printers).

SAN (StorageAreaNetwork) Is a dedicated network connecting storage devices with servers. Works using the protocol Fiber Channel or iSCSI.

WITH FiberChannel everything is clear - optics. But iSCSI- encapsulation of packets in IP protocol, allows you to create storage networks based on Ethernet infrastructure, transmission speed 1Gb and 10GB. According to the developers, the iSCSI speed should be sufficient for almost all business applications. To connect the server to Storage on iSCSI required adapters with support iSCSI... When using iSCSI, at least two routes are laid to each device using VLAN, each device and LUN(defines a virtual partition in an array, used for addressing) an address is assigned ( WorldWideName).

The difference NAS from SAN in what's on the net SAN I / O operations read and write data in blocks. Storage has no idea about the structure of file systems.

The most branded vendors on the storage device market are: NetApp, IBM, HP, DELL, HITACHI, EMC.

Our project requires a storage system with the following characteristics:

Volume 1TB for files, 1TB for operating systems of servers and databases, 300 - 500 GB, for backup servers + stock. Total at least 3TB of disk space
Support for SMB and NFS protocols, for distributing shared files to users without the participation of servers
If we want to load the hypervisor from Storage, you need at least iSCSI protocol
In theory, you still need to take into account such an important parameter as the speed of input / output (IO) that the storage system can provide. You can estimate this parameter by measuring IO on a working hardware, for example, using the IOMeter program.

It should be borne in mind that clustering from Microsoft works only through FiberChannel.

Here is a list of firms and hardware to choose from:

Asustor

Asustor AS 606T, AS 608T, 609 RD(besides the ability to install up to 8 4Tb disks, support for VMware, Citrix and Hyper-V is declared.

Hardware component

CPU Intel Atom 2.13

RAM 1GB (3GB) DDR3

Hard 2.5, 3.5, SATA 3 or SSD

Lan Gigabit Ethernet - 2

LCD Screen, HDMI

Net

Network protocols

File system

For built-in hard drives: EXT4, For external hard drives: FAT32, NTFS, EXT3, EXT4, HFS +

Storage

Support for multiple volumes with spare disks

Volume Type: Single disk, JBOD, RAID 0, RAID 1, RAID 5, RAID 6, RAID 10

Support for online migration of RAID levels

Maximum number of targets: 256

Maximum LUNs: 256

Masking targets

LUN mapping

Mount ISO images

MPIO and MCS support

Persistent redundancy (SCSI-3)

Disk management

Search for bad blocks on a schedule

Scheduled S.M.A.R.T Scan

Supported OS

Windows XP, Vista, 7, 8, Server 2003, Server 2008, Server 2012

Mac OS X 10.6 Onwards

UNIX, Linux, and BSD

Backup

Rsync (remote sync) mode support

Cloud backup

FTP backup

Backing up to external media

One touch backup

System administration

Log type: syslog, connection log, file access log

Real-time user activity recorder

Real-time system monitor

Network basket

Disk quota of users

Virtual disk (mount ISO images, max. 16)

UPS support

Access control

Maximum number of users: 4096

Maximum number of groups: 512

Maximum number of shared folders: 512

Maximum concurrent connections: 512

Windows Active Directory support

Safety

Firewall: Prevent Unauthorized Access

Surge protector: Prevent network attacks

Threat notifications: E-mail, SMS

Secured connections: HTTPS, FTP over SSL / TLS, SSH, SFTP, Rsync over SSH

Operating system ADM with the ability to connect additional modules via app central

Models AS 604RD, AS 609RD Unlike AS 606T, AS 608T, do not include an LCD display, are designed for installation in a rack and have a redundant power supply, support for virtualization platforms is declared

Netgear

Ready Nas 2100, Ready Nas 3100, Ready Nas Pro 6

Hardware component

CPU Intel SOC 1GHz

Hard 2.5, 3.5, SATA 2 or SSD

Lan Gigabit Ethernet - 2

Net

Network protocols

CIFS / SMB, AFP, NFS, FTP, WebDAV, Rsync, SSH, SFTP, iSCSI, HTTP, HTTPS

File system

For built-in hard drives: BTRFS, For external hard drives: FAT32, NTFS, EXT3, EXT4, HFS +

Storage

Supports online RAID capacity expansion

Maximum number of targets: 256

Maximum LUNs: 256

Masking targets

LUN mapping

Disk management

Disk capacity, performance, load monitoring

Scanning to find bad blocks on disks

HDD S.M.A.R.T.support

On-line data correction on disks

Disk Scrubbing support

Defragmentation support

Messages (from SMTP service via e-mail, SNMP, syslog, local log)

Auto shutdown (HDD, fans, UPS)

Restoration of performance when power is restored

Supported OS

Microsoft Windows Vista (32/64-bit), 7 (32/64-bit), 8 (32/64-bit), Microsoft Windows Server 2008 R2 / 2012, Apple OS X, Linux / Unix, Solaris, Apple iOS, Google Android)

Backup

Unlimited snapshots for continuous protection.

Recover snapshots at any point in time. Via Graphical User Interface (Admin Console), ReadyCLOUD, or Windows Explorer

The ability to create a snapshot manually or through the scheduler

Synchronizing files via R-sync

Cloud management Remote Replication (ReadyNAS to ReadyNAS). Does not require licenses for devices running the operating system Radiator OS v6.

Hot Redundancy

ESATA support

Supports backup to external drives to e (USB / eSATA)

Supports Remote Apple Time Machine backup and restore (via ReadyNAS Remote)

ReadyNAS Vault Cloud Service Support (Optional)

ReadyDROP sync support (Mac / Windows file sync to ReadyNAS)

Support for the DropBox service for file synchronization (requires an account on the DropBox service)

System administration

ReadyCLOUD for device discovery and management

RAIDar - Network Device Discovery Agent (Windows / Mac)

Saving and restoring the configuration file

The event log

Syslog server messaging support

SMB messaging support

Graphical user interface in Russian and English

Genie + marketplace. Built-in app store to enhance device functionality

Unicode character support

Disk manager

Thin provision Shares and LUNs support

Instant resource allocation

Access control

Maximum number of users: 8192

Maximum number of groups: 8192

Maximum number of folders provided for network access: 1024

Maximum number of connections: 1024

Access to folders and files based on ACL

Extended folder and subfolder permissions based on ACL for CIFS / SMB, AFP, FTP, Microsoft Active Directory (AD) Domain Controller Authentication

Own access lists

ACL-based ReadyCLOUD Access Lists

Operating system

ReadyNAS OS 6 is based on Linux 3.x

Ready Nas 3100 distinguishes Ready Nas 2100 2GB ECC memory

Ready Nas Pro 6- 6-slot storage, Intel Atom D510 processor, 1GB DDR2 memory

Qnap

TS-869U-RP, TS-869 PRO

Hardware component

CPU Intel Atom 2.13GHz

Hard 2.5, 3.5, SATA 3 or SSD

Lan Gigabit Ethernet - 2

Net

IPv4, IPv6, Supports 802.3ad and Six Other Modes for Load Balancing and / or Network Failover, Vlan

Network protocols

CIFS / SMB, AFP, NFS, FTP, WebDAV, Rsync, SSH, SFTP, iSCSI, HTTP, HTTPS

File system

For built-in hard drives: EXT3, EXT4, For external hard drives: FAT32, NTFS, EXT3, EXT4, HFS +

Storage

Volume type: RAID 0, RAID 1, RAID 5, RAID 6, RAID 10

Supports online RAID capacity expansion

Maximum number of targets: 256

Maximum LUNs: 256

Masking targets

LUN mapping

ISCSI Initiator (Virtual Disk)

Stack Chaining Master

Up to 8 virtual disks

Disk management

Increase the storage capacity of a RAID array without data loss

Bad block scan

RAID recovery function

Bitmap support

Supported OS

Backup

Real Time Replication (RTRR)

Works both as RTRR server and client

Supports real-time and scheduled backups

File filtering, compression and encryption possible

Button for copying data from / to an external device

Apple Time Machine support with reservation management

Block-level resource replication (Rsync)

Works both as server and client

Secure replication between QNAP servers

Backing up to external media

Backing up to cloud storage systems

NetBak Replicator for Windows

Apple Time Machine support

System administration

AJAX-based web interface

HTTP / HTTPS connection

Instant email and SMS notifications

Cooling system management

DynDNS and dedicated service MyCloudNAS

Supports SNMP UPS (USB)

Network UPS support

Resource Monitor

Network bucket for CIFS / SMB and AFP

Detailed event and connection logs

List of active users

Syslog client

Firmware update

Saving and restoring system settings

Restoring factory settings

Access control

Up to 4096 user accounts

Up to 512 user groups

Up to 512 network resources

Batch add users

User import / export

Setting quota parameters

Managing access rights to subfolders

Operating system

TS - 869 Pro- model without backup power supply, memory capacity 1GB

Synology

RS 2212, DS1813

Hardware component

CPU Intel Core 2.13GHz

Hard 2.5, 3.5, SATA 2 or SSD

Lan Gigabit Ethernet - 2

Net

IPv4, IPv6, Supports 802.3ad and Six Other Modes for Load Balancing and / or Network Failover

Network protocols

CIFS / SMB, AFP, NFS, FTP, WebDAV, SSH

File system

For built-in hard drives: EXT3, EXT4, For external hard drives: NTFS, EXT3, EXT4

Storage

Volume type: RAID 0, RAID 1, RAID 5, RAID 6, RAID 10

Maximum number of targets: 512

Maximum LUNs: 256

Disk management

Changing the RAID level without shutting down the system

Supported OS

Windows 2000 and later, Mac OS X 10.3 and later, Ubuntu 9.04 and later

Backup

Network redundancy

Local redundancy

Synchronizing shared folders

Desktop reservation

System administration

System event notification by SMS, E-mail

User quota

Resource monitoring

Access control

Up to 2048 user accounts

Up to 256 user groups

Up to 256 network resources

Operating system

DS1813- 2 GB RAM, 4 Gigabit, HASP 1C support, 4TB disk support

Thecus

N8800PRO v2, N7700PRO v2, N8900

Hardware component

CPU Intel Core 2 1.66GHz

Lan Gigabit Ethernet - 2

LAN capability 10GB

Net

IPv4, IPv6, Supports 802.3ad and Six Other Modes for Load Balancing and / or Network Failover

Network protocols

CIFS / SMB, NFS, FTP

File system

For internal hard drives: EXT3, EXT4, For external hard drives: EXT3, EXT4, XFS

Storage

Volume type: RAID 0, RAID 1, RAID 5, RAID 6, RAID 10, RAID 50, RAID 60

Supports online RAID capacity expansion

Masking targets

LUN mapping

Disk management

Disk health monitoring (S.M.A.R.T)

Bad block scan

The ability to mount ISO images

Supported OS

Microsoft Windows 2000, XP, Vista (32/64 bit), Windows 7 (32/64 bit), Server 2003/2008

Backup

Acronis True Image

Thecus Backup Utility

Reading from an optical disc to Nas

System administration

Server web-based administration interface

Access control

ADS support

Operating system

N7700PRO v2- model without backup power supply

N8900- new model with support for SATA 3 and SAS

Based on the data above, at least 3-x Tb is required at the moment, and when updating the OS and programs, this figure can be multiplied by two, then you need disk storage with a capacity of at least 6Tb, and with the possibility of growth. Therefore, with a bookmark for the future and the organization of a RAID 5 array, the final figure is the need for 12 Tb... When supporting a 4Tb hard disk drive system, a system with at least six drive bays is required.

The selection has been significantly reduced by the following models: AS 609RD, Ready NAS 3200, TS-869U-RP, RS-1212RP +, N8900... All models include additional power supply... And the manufacturer's declared support for well-known virtualization platforms... The most interesting was the model from NetGear - Ready NAS 3200, since only this model, besides SMART, supported at least some additional technologies for working with disks except SMART and memory with ECC, but the price flew out for 100,000 rubles, besides there were doubts about the possibility of working with 4Tb and SATA3 disks in it. Price per RS-1212RP +, also flew above 100 thousand. AS 609RD- the player in the storage systems market is very new, so it is not known how this will behave Storage.

Of which there was only two systems to choose from: TS-869U-RP, N8900.

TS-869U-RP- at the moment it costs about 88,000 rubles.

N8900- the price is 95 400 rubles, it has a lot of advantages in comparison with TS-869U-RP- this is support for both SATA drives and SAS, the possibility of additional installation of the adapter 10 Gb, more powerful dual-core processor, support for SATA3 4Tb drives. In addition, there is a firmware backup for a backup microcircuit, which gives more favorable reliability in comparison with other systems.

Back

Shkera

With the everyday complication of networked computer systems and global corporate solutions, the world began to demand technologies that would give impetus to the revival of corporate storage systems (storage systems). And so, one single technology brings unprecedented performance, tremendous scalability and exceptional TCO benefits to the world's treasure trove of storage advancements. The circumstances that have formed with the advent of the FC-AL (Fiber Channel - Arbitrated Loop) standard and the SAN (Storage Area Network) that develops on its basis promise a revolution in data-oriented computing technologies.

"The most significant development in storage we" ve seen in 15 years "
Data Communications International, March 21, 1998

Formal definition of SAN as defined by the Storage Network Industry Association (SNIA):

“A network whose main task is to transfer data between computer systems and data storage devices, as well as between the storage systems themselves. The SAN consists of the communications infrastructure that provides physical connectivity and is also responsible for the management layer that integrates communications, storage and computer systems to transfer data securely and reliably. ”
SNIA Technical Dictionary, copyright Storage Network Industry Association, 2000

Options for organizing access to storage systems

There are three main options for organizing access to storage systems:

SAS (Server Attached Storage), storage attached to the server;
NAS (Network Attached Storage), storage connected to the network;
SAN (Storage Area Network), storage area network.

Consider the topologies of the corresponding storage systems and their features.

SAS

A storage system connected to a server. The familiar, traditional way of connecting a storage system to a high-speed interface in a server, usually a parallel SCSI interface.

Figure 1. Server Attached Storage

The use of a separate storage enclosure for the SAS topology is optional.

The main advantage of a storage connected to a server in comparison with other options is a low price and high performance at the rate of one storage for one server. This topology is the most optimal in the case of using a single server through which access to the data array is organized. But she still has a number of problems that prompted designers to look for other options for organizing access to storage systems.

The features of SAS include:

Access to data depends on the OS and the file system (in general);
The complexity of organizing systems with high availability;
Low cost;
High performance within one node;
Reducing the response speed when loading the server that serves the store.

NAS

A storage system connected to the network. This option for organizing access appeared relatively recently. Its main advantage is the convenience of integrating additional storage into existing networks, but by itself it does not bring any radical improvements to the storage architecture. In fact, a NAS is a pure file server, and today you can find many new implementations of storage like NAS based on Thin Server technology.

Figure 2. Network Attached Storage.

NAS Features:

Dedicated file server;
Data access is OS and platform independent;
Convenience of administration;
Maximum ease of installation;
Low scalability;
Conflict with LAN / WAN traffic.

NAS-based storage is ideal for low-cost servers with minimal features.

SAN

Data storage networks began to develop intensively and were introduced only in 1999. The SAN is based on a network separate from the LAN / WAN, which serves to organize access to data from servers and workstations involved in their direct processing. This network is built on the Fiber Channel standard, which gives storage systems the advantages of LAN / WAN technologies and the ability to organize standard platforms for systems with high availability and high demand. Almost the only drawback of SAN today is the relatively high cost of components, but the total cost of ownership for enterprise systems built using SAN technology is quite low.

Figure 3. Storage Area Network.

The main advantages of a SAN include almost all of its features:

SAN topology independence from storage systems and servers;
Convenient centralized management;
No conflict with LAN / WAN traffic;
Convenient data backup without loading the local network and servers;
High performance;
High scalability;
High flexibility;
High availability and fault tolerance.

It should also be noted that this technology is still quite young and in the near future it should undergo many improvements in the field of standardization of management and the way SAN subnets interact. But one can hope that this only threatens the pioneers with additional prospects for leadership.

FC as the basis for building a SAN

Like a LAN, a SAN can be built using a variety of topologies and media. When building a SAN, both a parallel SCSI interface and Fiber Channel or, say, SCI (Scalable Coherent Interface) can be used, but Fiber Channel owes its ever-increasing popularity to SAN. Experts with significant experience in the development of both channel and network interfaces took part in the design of this interface, and they managed to combine all the important positive features of both technologies in order to get something truly revolutionary. What exactly?

Main key features of duct:

Low latency
High speeds
High reliability
Point-to-point topology
Small distances between nodes
Platform Dependency

and network interfaces:

Multipoint topologies
Long distance
High scalability
Low speeds
Large delays

merged into Fiber Channel:

High speeds
Protocol independence (0-3 levels)
Long distance
Low latency
High reliability
High scalability
Multipoint topologies

Traditionally, storage interfaces (what sits between the host and storage devices) have been a barrier to performance and storage growth. At the same time, applied tasks require a significant increase in hardware capacity, which, in turn, leads to the need to increase the bandwidth of interfaces for communication with storage systems. It is the problems of building flexible, high-speed data access that Fiber Channel helps solve.

The Fiber Channel standard was finally defined over the past few years (from 1997 to 1999), during which tremendous work was carried out to harmonize the interaction of manufacturers of various components, and everything was done to make Fiber Channel turn from a purely conceptual technology into real, which received support in the form of installations in laboratories and computing centers. In the year 1997, the first commercial samples of cornerstone components for building FC-based SANs such as adapters, hubs, switches and bridges were designed. Thus, since 1998, FC has been used commercially in the business sector, in production and in large-scale projects for the implementation of systems that are critical to failures.

Fiber Channel is an open industry standard for high-speed serial communication. It connects servers and storage systems at a distance of up to 10 km (using standard equipment) at a speed of 100 MB / s (at Cebit 2000, samples of products were presented that use the new Fiber Channel standard at speeds of 200 MB / s per ring, and in laboratory conditions, implementations of the new standard with speeds of 400 MB / s are already being exploited, which is 800 MB / s when using a double ring.) (At the time of publication of the article, a number of manufacturers had already started shipping network cards and switches based on FC 200 MB / s .) Fiber Channel concurrently supports a variety of standard protocols (including TCP / IP and SCSI-3) over a single physical media, which potentially simplifies networking infrastructure while also offering opportunities to reduce installation and maintenance costs. using separate subnets for LAN / WAN and SAN has several advantages and is the recommended default.

One of the most important advantages of Fiber Channel, along with speed parameters (which, by the way, are not always the main ones for SAN users and can be implemented using other technologies) is the ability to work over long distances and topology flexibility, which came to the new standard from network technologies. Thus, the concept of building a SAN topology is based on the same principles as traditional networks, usually based on hubs and switches, which help prevent speed drops with an increase in the number of nodes and create the possibility of convenient organization of systems without a single point of failure.

For a better understanding of the advantages and features of this interface, we present the comparative characteristics of FC and Parallel SCSI in the form of a table.

Table 1. Comparison of Fiber Channel and Parallel SCSI Technologies

The Fiber Channel standard assumes the use of a variety of topologies, such as point-to-point, ring or FC-AL hub (Loop or Hub FC-AL), backbone switch (Fabric / Switch).

A point-to-point topology is used to connect a single storage system to a server.

Loop or Hub FC-AL - for connecting multiple storage devices to multiple hosts. When organizing a double ring, the speed and fault tolerance of the system increases.

Switches are used to provide maximum performance and resiliency for complex, large and branched systems.

Due to network flexibility, the SAN has an extremely important feature - the convenient ability to build fault-tolerant systems.

By offering storage alternatives and multi-store aggregation capabilities for hardware redundancy, SANs help protect hardware and software systems from hardware failures. For demonstration, we will give an example of creating a two-mode system without points of failure.

Figure 4. No Single Point of Failure.

The construction of three or more node systems is carried out by simply adding additional servers to the FC network and connecting them to both hubs / switches).

With FC, building disaster tolerant systems becomes transparent. Network channels for both storage and local networks can be laid on the basis of optical fiber (up to 10 km or more using signal amplifiers) as a physical carrier for FC, while using standard equipment, which makes it possible to significantly reduce the cost of such systems.

With the ability to access all SAN components from anywhere, we get an extremely flexible data network. It should be noted that the SAN provides transparency (the ability to see) all components, up to disks in storage systems. This feature has pushed component manufacturers to leverage their considerable experience in building management systems for LAN / WAN in order to incorporate extensive monitoring and management capabilities into all SAN components. These capabilities include monitoring and managing individual nodes, component storage, enclosures, network devices, and network sub-structures.

The SAN management and monitoring system uses such open standards as:

SCSI command set
SCSI Enclosure Services (SES)
SCSI Self Monitoring Analysis and Reporting Technology (S.M.A.R.T.)
SAF-TE (SCSI Accessed Fault-Tolerant Enclosures)
Simple Network Management Protocol (SNMP)
Web-Based Enterprise Management (WBEM)

Systems built using SAN technologies not only provide the administrator with the ability to monitor the development and state of storage resources, but also open up opportunities for monitoring and controlling traffic. With these resources, SAN management software implements the most efficient storage scheduling and component load balancing schemes.

SANs integrate well with existing information infrastructures. Their implementation does not require any changes in the existing LAN and WAN networks, but only expands the capabilities of existing systems, relieving them of the tasks focused on transferring large amounts of data. Moreover, when integrating and administering a SAN, it is very important that the key elements of the network are hot-swappable and installable, with dynamic configuration capabilities. So the administrator can add one or another component or replace it without shutting down the system. And this whole integration process can be visually displayed in a graphical SAN management system.

Having considered the above advantages, we can highlight a number of key points that directly affect one of the main advantages of the Storage Area Network - the total cost of ownership (Total Cost Ownership).

Incredible scalability allows an enterprise using a SAN to invest in servers and storage as needed. And also keep your investments in already installed equipment when changing technological generations. Each new server will have high-speed access to the storage and every additional gigabyte of storage will be available to all servers on the subnet upon command from the administrator.

Excellent capabilities for building resilient systems can provide direct commercial benefits from minimizing downtime and rescue the system in the event of a natural disaster or other disaster.

The controllability of the components and the transparency of the system provide the ability to carry out centralized administration of all storage resources, and this, in turn, significantly reduces the costs of their support, the cost of which, as a rule, is more than 50% of the cost of equipment.

SAN Impact on Applications

In order for our readers to understand how practically useful the technologies discussed in this article are, we will give several examples of applied problems that would be ineffectively solved without the use of storage networks, would require colossal financial investments, or would not be solved at all by standard methods.

Data Backup and Recovery

Using the traditional SCSI interface, the user when building data backup and recovery systems is faced with a number of complex problems that can be very easily solved using SAN and FC technologies.

Thus, the use of storage networks takes the solution of the problem of backup and recovery to a new level and provides the opportunity to perform backup several times faster than before, without loading the local network and servers with data backup.

Server Clustering

One of the typical tasks for which SAN is effectively used is server clustering. Since one of the key points in the organization of high-speed cluster systems that work with data is access to storage, with the advent of SAN, the construction of multi-node clusters at the hardware level is solved by simply adding a server connected to the SAN (this can be done without even turning off the system, since FC switches support hot-plug). When using a parallel SCSI interface, the connectivity and scalability of which is much worse than that of FC, data-oriented clusters would be difficult to do with more than two nodes. Parallel SCSI switches are complex and expensive, and are standard for FCs. To create a cluster that will not have a single point of failure, it is enough to integrate a mirrored SAN into the system (DUAL Path technology).

Within clustering, one of the RAIS (Redundant Array of Inexpensive Servers) technologies seems especially attractive for building powerful scalable e-commerce systems and other types of tasks with increased power requirements. According to Alistair A. Croll, co-founder of Networkshop Inc, using RAIS is quite effective: “For example, for $ 12,000-15,000, you can buy about six inexpensive one- or two-processor (Pentium III) Linux / Apache servers. The power, scalability and resiliency of such a system will be significantly higher than, for example, a single four-way server based on Xeon processors, and the cost is the same. "

Concurrent video streaming, data sharing

Imagine a task when you need to edit video at several (say> 5) stations or just work on huge data. Transferring a 100GB file over a local network will take you a few minutes, and the overall work on it will be very difficult. With SAN, each workstation and server on the network can access the file at the equivalent of a local high-speed disk. If you need another station / server to process data, you can add it to the SAN without shutting down the network by simply connecting the station to the SAN switch and granting it access to the store. If you are no longer satisfied with the performance of the data subsystem, you can simply add one more storage and use data distribution technology (for example, RAID 0) to get twice the performance.

Main SAN components

Wednesday

Fiber Channel uses copper and fiber to connect components. Both types of cables can be used simultaneously when building a SAN. Interface conversion is done using GBIC (Gigabit Interface Converter) and MIA (Media Interface Adapter). Both types of cable today provide the same data transfer rate. Copper cable is used for short distances (up to 30 meters), optical - for both short and for distances up to 10 km and more. Uses multimode and singlemode optical cables. Multimode cable is used for short distances (up to 2 km). The inner fiber diameter of the multimode cable is 62.5 or 50 microns. For a transfer rate of 100 MB / s (200 MB / s full duplex) when using multimode fiber, the cable length should not exceed 200 meters. Single mode cable is used for long distances. The length of such a cable is limited by the power of the laser used in the signal transmitter. The single-mode cable has an inner diameter of 7 or 9 microns and allows a single beam to pass through.

Connectors, adapters

To connect copper cables, DB-9 or HSSD connectors are used. HSSD is considered more reliable, but DB-9 is used just as often because it is simpler and cheaper. The standard (most common) connector for optical cables is the SC connector, it provides a high-quality, clear connection. For normal connection, multimode SC connectors are used, and for remote connections, single-mode ones. Micro-connectors are used in multiport adapters.

The most common adapters for FC for the PCI 64 bit bus. Also, many FC adapters are developed for the S-BUS bus, adapters for MCA, EISA, GIO, HIO, PMC, Compact PCI are produced for specialized use. The most popular are single-port, there are two- and four-port cards. On PCI adapters, as a rule, DB-9, HSSD, SC connectors are used. Also, there are often GBIC-based adapters that come with or without GBIC modules. Fiber Channel adapters differ in the classes they support and in a variety of features. To understand the differences, we present a comparative table of adapters manufactured by QLogic.

Fiber Channel Host Bus Adapter Family Chart
SANblade	64 Bit	FCAL Publ. Pvt Loop	FL Port	Class 3	F Port	Class 2	Point to point	IP / SCSI	Full duplex	FC Tape	PCI 1.0 Hot Plug Spec	Solaris Dynamic Reconfig	VIВ	2Gb
2100 Series	33 & 66MHz PCI	X	X	X
2200 Series	33 & 66MHz PCI	X	X	X	X	X	X	X	X	X
	33MHz PCI	X	X	X	X	X	X	X	X	X	X
	25 MHZ Sbus	X	X	X	X	X	X	X	X	X		X
2300 Series	66 MHZ PCI / 133MHZ PCI-X	X	X	X	X	X	X	X	X	X			X	X

Concentrators

Fiber Channel HUBs (hubs) are used to connect nodes to an FC ring (FC Loop) and have a structure similar to Token Ring hubs. Since a ring break can cause the network to stop functioning, modern FC hubs use PBC-port bypass circuit ports, which automatically open / close the ring (connect / disconnect systems connected to the hub). Typically FC HUBs support up to 10 connections and can stack up to 127 ports per ring. All devices connected to the HUB receive a common bandwidth that they can share with each other.

Switches

Fiber Channel Switches (switches) have the same functions as LAN switches familiar to the reader. They provide full-speed, non-blocking connectivity between nodes. Any node connected to the FC switch receives full (scalable) bandwidth. As the number of ports on a switched network increases, its bandwidth increases. Switches can be used in conjunction with hubs (which are used for sites that do not require dedicated bandwidth for each node) to achieve the best price / performance ratio. Thanks to cascading, switches can potentially be used to create FC networks with the number of addresses from 2 to 24 (over 16 million).

Bridges

FC Bridges (bridges or multiplexers) are used to connect parallel SCSI devices to an FC-based network. They provide translation of SCSI packets between Fiber Channel and Parallel SCSI devices, examples of which are Solid State Disk (SSD) or tape libraries. It should be noted that recently, almost all devices that can be utilized within the SAN, manufacturers are starting to produce with a built-in FC interface for their direct connection to storage networks.

Servers and storage

Despite the fact that servers and storage are far from the least important SAN components, we will not dwell on their description, since we are sure that all our readers are familiar with them.

In the end, I would like to add that this article is just the first step towards storage networks. To fully understand the topic, the reader should pay a lot of attention to the implementation features of the components by SAN manufacturers and management software, since without them the Storage Area Network is just a set of elements for switching storage systems that will not bring you the full benefits of implementing a storage area network.

Conclusion

Today, the Storage Area Network is a fairly new technology that could soon become mainstream among corporate customers. In Europe and the United States, businesses with a fairly large fleet of installed storage systems are already beginning to migrate to storage area networks with the best total cost of ownership (TCO).

Analysts predict that in 2005 a significant number of mid-tier and high-end servers will ship with pre-installed Fiber Channel (this trend can be seen today), and only for internal disk drives in servers will use the parallel SCSI interface. Already today, when building storage systems and purchasing mid- and upper-level servers, one should pay attention to this promising technology, especially since today it makes it possible to implement a number of tasks much cheaper than using specialized solutions. Plus, when you invest in SAN technology today, you won't lose your investment tomorrow, as the features of Fiber Channel create a great opportunity to leverage your investment in the future.

P.S.

The previous version of this article was written in June 2000, but due to the lack of popular interest in storage area networking technology, publication has been postponed for the future. This future has come today, and I hope that this article will induce the reader to realize the need to switch to storage area network technology as an advanced technology for building storage systems and organizing data access.

We are starting a new section called "Educational program". Seemingly well-known things will be described here, but, as it often turns out, not to everyone, and not so well. We hope this section will be useful.

So, issue number 1 - "Data storage systems".

Data storage systems.

In English, they are called in one word - storage, which is very convenient. But this word is translated into Russian rather clumsily - "storage". Often in the slang of “IT-Schnikov” they use the word “storaj” in Russian transcription, or the word “khranilka”, but this is quite bad manners. Therefore, we will use the term "storage systems", abbreviated storage systems, or simply "storage systems".

Data storage devices include any device for recording data: the so-called. "Flash drives", compact disks (CD, DVD, ZIP), tape drives (Tape), hard drives (Hard disk, they are also called in the old fashioned "hard drives", since their first models resembled a clip with cartridges of the same name rifle of the 19th century) and etc. Hard drives are used not only inside computers, but also as external USB-devices for recording information, and even, for example, one of the first iPod models is a small hard drive with a diameter of 1.8 inches, with a headphone output and a built-in screen ...

Recently, the so-called. "Solid State" storage systems SSD (Solid State Disk, or Solid State Drive), which are similar in principle to a "flash drive" for a camera or smartphone, only have a controller and a larger amount of stored data. Unlike a hard drive, an SSD has no mechanically moving parts. While the prices for such storage systems are quite high, they are falling rapidly.

All of these are consumer devices, and among industrial systems, one should highlight, first of all, hardware storage systems: hard disk arrays, the so-called. RAID controllers for them, tape storage systems for long-term data storage. In addition, a separate class: controllers for storage systems, for managing data backup, creating "snapshots" (Snapshots) in the storage system for their subsequent recovery, data replication, etc.). Storage systems also include network devices (HBA, Fiber Channel Switches, FC / SAS cables, etc.). Finally, large-scale solutions have been developed for data storage, archiving, data recovery and disater recovery.

Where does the data to be stored come from? From us, loved ones, users, from application programs, e-mail, as well as from various equipment - file servers, and database servers. In addition, the provider of large amounts of data - the so-called. M2M devices (Machine-to-Machine communication) - all kinds of sensors, sensors, cameras, etc.

By the frequency of using the stored data, storage systems can be divided into short-term storage systems (online storage), medium-duration storage (near-line storage) and long-term storage systems (offline storage).

The first can be attributed to the hard disk (or SSD) of any personal computer. The second and the third are external storage systems DAS (Direct Attached Storage), which can be an array of disks external to the computer (Disk Array). They, in turn, can also be subdivided into Just a Bunch Of Disks (JBOD) and an intelligent disk array storage (iDAS) controller.

External storage systems come in three types: DAS (Direct Attached Storage), SAN (Storage Area Network) and NAS (Network attached Storage). Unfortunately, even many experienced IT specialists cannot explain the difference between SAN and NAS, saying that once this difference existed, and now it allegedly no longer exists. In fact, there is a difference, and a significant one (see Fig. 1).

Figure 1. The difference between SAN and NAS.

In a SAN, the servers themselves are effectively connected to the storage system through the SAN. In the case of NAS, the network servers are connected via a local area network (LAN) to a shared file system in RAID.

Basic storage connection protocols

SCSI protocol(Small Computer System Interface), pronounced skazi, is a protocol developed in the mid-1980s for connecting external devices to mini-computers. Its SCSI-3 version is the basis for all storage communication protocols and uses a common SCSI command set. Its main advantages are: independence from the server used, the possibility of parallel operation of several devices, high data transfer rate. Disadvantages: limited number of connected devices, connection range is very limited.

FC protocol(Fiber Channel), an internal protocol between a server and shared storage, controller, disks. It is a widely used serial communication protocol that operates at 4 or 8 Gigabits per second (Gbps). It, as its name implies, works through fiber, but it can also work over copper. Fiber Channel is the primary protocol for FC SAN storage systems.

ISCSI protocol(Internet Small Computer System Interface), a standard protocol for transferring blocks of data over the well-known TCP / IP protocol i.e. SCSI over IP. iSCSI can be viewed as a high-speed, low-cost storage solution for remotely connected storage systems over the Internet. iSCSI encapsulates SCSI commands in TCP / IP packets for transmission over an IP network.

SAS protocol(Serial Attached SCSI). SAS uses serial data transmission and is compatible with SATA hard drives. Currently, SAS can transfer data at 3 Gbps or 6 Gbps, and supports full duplex mode, i.e. can transfer data in both directions at the same speed.

Types of storage systems.

Three main types of storage systems can be distinguished:

DAS (Direct Attached Storage)
NAS (Network attached Storage)
SAN (Storage Area Network)

Storage systems with direct connection of DAS disks were developed back in the late 70s, due to the explosive increase in user data, which simply did not physically fit into the internal long-term memory of computers was not, but large computers, the so-called mainframes). The data transfer speed in DAS was not very high, from 20 to 80 Mbit / s, but it was quite enough for the needs of that time.

Figure 2. DAS

NAS networked storage appeared in the early 90s. The reason was the rapid development of networks and the critical requirements for the sharing of large amounts of data within the enterprise or operator's network. The NAS used a special network file system CIFS (Windows) or NFS (Linux), so different servers of different users could read the same file from the NAS at the same time. The data transfer rate was already higher: 1 - 10 Gbps.

Figure 3. NAS

In the mid-90s, networks for connecting FC SAN storage devices appeared. Their development was prompted by the need to organize the data scattered across the network. A single storage device on a SAN can be split into several small nodes called Logical Unit Number (LUN), each of which belongs to a single server. The data transfer rate has increased to 2-8 Gbps. Such storage systems could provide technologies for protecting data from loss (snapshot, backup).

Figure 4. FC SAN

Another type of SAN is IP SAN (IP Storage Area Network), developed in the early 2000s. FC SANs were expensive, difficult to manage, and IP networks were at their peak, which is why this standard was born. The storage systems were connected to servers using an iSCSI controller via IP switches and provided a data transfer rate of 1-10 Gb / s.

Fig. 5. IP SAN.

The table below shows some comparative characteristics of all considered storage systems:

A type	NAS	SAN
A type	NAS
Parameter		FC SAN	*IP SAN*	DAS
Transfer type	SCSI, FC, SAS	FC	IP	IP
Data type	Data block	File	Data block	Data block
Typical application	Any	File Server	Database	Video monitoring
Advantage	Excellent compatibility	Easy to install, low cost	Good scalability	Good scalability
Flaws	Difficulty in control. Inefficient use of resources. Poor scalability	Poor performance. Limitations in applicability	High price. Complexity of scaling configuration	Low productivity

In short, SANs are designed to transfer massive blocks of data to storage systems, while NAS provides file-level data access. The SAN + NAS combination provides highly data integration, high performance and file sharing. Such systems are called unified storage - "unified storage systems".

Unified storage systems: a network storage architecture that supports both file-based NAS and block-based SAN. Such systems were developed in the early 2000s to solve the administrative problems and high total cost of ownership of separate systems in a single enterprise. This storage system supports almost all protocols: FC, iSCSI, FCoE, NFS, CIFS.

Hard drives

All hard drives can be divided into two main types: HDD (Hard Disk Drive, which, in fact, is translated as "hard drive") and SSD (Solid State Drive, - the so-called "solid state drive"). That is, both drives are hard drives. What, then, is there a “soft disk”? Yes, in the past there were, they were called "floppy disks" (so they were called because of the characteristic "popping" sound in the drive during operation). Drives for them can still be seen in the system blocks of old computers, which have been preserved in some government institutions. However, with all the desire, such magnetic disks can hardly be attributed to storage SYSTEMS. These were some analogs of the current "flash drives", albeit of a very small capacity.

The difference between HDD and SSD is that the HDD has several coaxial magnetic disks inside and complex mechanics that move the magnetic read-write heads, and the SSD has no mechanically moving parts at all, and is, in fact, a microcircuit molded into plastic. Therefore, strictly speaking, it is incorrect to call only HDDs "hard disks".

Hard drives can be classified according to the following parameters:

Design: HDD, SSD;
HDD diameter in inches: 3.5, 2.5, 1.8 inches;
Interface: ATA / IDE, SATA / NL SAS, SCSI, SAS, FC
Usage class: individual (desktop class), corporate (enterprsie class).


Parameter	SATA	SAS	NL-SAS	SSD
Rotational speed (RPM)	7200	15000/10000	7200	NA
Typical capacity (TB)	1T / 2T / 3T	0.3T / 0.6T / 0.9T	2T / 3T / 4T	0.1T / 0.2T / 0.4T
MTBF (hour)	1 200 000	1 600 000	1 200 000	2 000 000
Notes (edit)	Evolution of serial ATA hard drives. SATA 2.0 supports transfer rates of 300MB / s, SATA3.0 supports up to 600MB / s. The average annualized Failure Rate (AFR) for SATA drives is about 2%.	SATA hard drives with SAS interface are suitable for tiering. The annualized Failure Rate (AFR) for NL-SAS drives is about 2%.		Solid state drives made of electronic memory chips, including a control device and a chip (FLASH / DRAM). The interface specification, function and method of use are the same as for HDD, size and shape are the same.

Characteristics of hard drives.

Capacity

In modern hard drives, capacity is measured in gigabytes or terabytes. For HDD, this value is a multiple of the capacity of one magnetic disk inside the box, multiplied by the number of magnetic ones, of which there are usually several.

Rotation speed (only for HDD)

The rotational speed of the magnetic disks inside the drive, measured in RPM (Rotation Per Minute), is usually 5400 RPM or 7200 RPM. HDDs with SCSI / SAS interfaces have a rotation speed of 10,000-15,000 RPM.

Average access time = Mean seek time + Mean wait time, i.e. time to retrieve information from disk.
Baud rate

This is the speed of reading and writing data on a hard drive, measured in megabytes per second (MB / S).

IOPS (Input / Output Per Second)

Input / Output Operations Per Second, one of the main indicators for measuring disk performance. For applications with frequent read and write operations such as Online Transaction Processing (OLTP), IOPS is the most important metric because the performance of the business application depends on it. Another important indicator is data throughput, which can be roughly translated as "data transmission bandwidth", which shows how much data can be transferred per unit of time.

RAID

No matter how reliable hard drives are, data in them is sometimes lost, for various reasons. Therefore, the technology was proposed RAID (Redundant Array of Independent Disks) - an array of independent disks with redundant data storage. Redundancy means that all bytes of data when written to one disk are duplicated on another disk, and can be used if the first disk fails. Moreover, this technology helps to increase IOPS.

The basic concepts of RAID are stripping (the so-called "stripping" or separation) and mirroring (the so-called "mirroring" or duplication) of data. Their combinations define different kinds of hard disk RAID arrays.

There are the following levels of RAID arrays:

Combinations of these types give rise to several more new types of RAID:

The following figure explains how RAID 0 (striping) is performed:

Rice. 6. RAID 0.

And this is how RAID 1 (duplication) is performed:

Rice. 7. RAID 1.

And this is how RAID 3 works. XOR is an eXclusive OR logical function. It calculates the parity value for data blocks A, B, C, D ..., which is written to a separate disk.

Rice. 8. RAID 3.

The above diagrams illustrate well how RAID works and need no comment. We are not going to show the diagrams of the rest of the RAID levels, those who wish can find them on the Internet.

The main characteristics of RAID types are shown in the table.

Storage software

Storage software can be categorized as follows:

Management and administration: management and setting of infrastructure parameters: ventilation, cooling, disk operating modes, etc., time-of-day control, etc.
Data protection: Snapshot, LUN content copy, split mirror, Remote Replication, CDP (Continuous Data Protection), etc.
Increased reliability: various software for multiple copying and backup of data transmission routes within the data center and between them.
Improving efficiency: Thin Provisioning, automatic tiered storage, deduplication, QoS management, cache prefetch, partitioning, automatic data migration , reducing the speed of rotation of the disk (disk spin down)

The technology is very interesting " thin provisioning". As is often the case in IT, terms are often difficult to adequately translate into Russian, for example, it is difficult to accurately translate the word "provisioning" ("provisioning", "support", "provision" - none of these terms convey the meaning completely). And when it is “thin” ...

A bank loan can be used to illustrate thin provisioning. When a bank issues ten thousand credit cards with a limit of 500 thousand, it does not need to have 5 billion in its account to service this volume of loans. Credit card users usually do not spend the entire loan at once, and only use a small part of it. Nevertheless, each user individually can use the entire or almost the entire amount of the loan, if the total amount of the bank's funds is not exhausted.

Rice. 9. Thin provisioning.

Thus, the use of thin provisioning allows you to solve the problem of inefficient allocation of space in the SAN, save space, ease the administrative procedures for allocating space to applications on storage, and use the so-called oversubscribing, that is, to allocate more space to applications than we physically have, counting on that applications will not claim all space at the same time. As the need for it arises later, it is possible to increase the physical storage capacity.

Tiered storage assumes that different data is stored in storage devices that are responsive to the frequency at which the data is accessed. For example, frequently used data can be placed in "online storage" on SSD drives with high access speed and high performance. However, the price of such disks is still high, so it is advisable to use them only for online storage (for now).

FC / SAS drives are also fast and reasonably priced. Therefore, such disks are well suited for "near-line storage", where data is stored, the access to which occurs not so often, but at the same time and not so rarely.

Finally, SATA / NL-SAS drives have a relatively slow access speed, but they are large in capacity and relatively cheap. Therefore, offline storage is usually done on them, for data of rare use.

As soon as the control system notices that access to data in offline storage has become more frequent, it transfers them to near-line storage, and with further activation of their use - to online storage on SSD disks.

Deduplication (elimination of duplication) of data(deduplication, DEDUP). As the name suggests, deduplication eliminates data duplication in disk space commonly used for data backups. Although the system is unable to determine what information is redundant, it can detect the presence of duplicate data. This makes it possible to significantly reduce the capacity requirements of the backup system.

Reducing the speed of rotation of the disk (Disk spin-down) - what is usually called "hibernation" (sleep) of the disk. If the data on some disk is not used for a long time, then Disk spin-down puts it into hibernation mode to reduce power consumption by unnecessarily spinning the disk at normal speed. This also increases the life of the disk and increases the reliability of the system as a whole. When a new request for data on this disk arrives, it "wakes up" and its rotation speed increases to normal. The price to pay for the energy savings and increased reliability is some latency when the data is first accessed on disk, but the cost is well worth it.

Disk snapshot (Snapshot). A snapshot is a fully usable copy of a specific set of data on disk at the time that copy was taken (which is why it is called a "snapshot"). Such a copy is used to partially restore the state of the system at the time of copying. At the same time, the continuity of the system is not affected at all, and the performance does not deteriorate.

Remote Replication: Works using Mirroring technology. Can maintain multiple copies of data across two or more sites to prevent data loss in the event of natural disasters. There are two types of replication: synchronous and asynchronous, the difference between them is explained in the figure.

Rice. 10. Remote replication of data (Remote Replication).

Continuous data protection (CDP) Also known as continuous backup or real-time backup, it creates a backup automatically whenever data changes. At the same time, it becomes possible to restore data in case of any disasters at any time, and at the same time an up-to-date copy of the data is available, and not those that were a few minutes or hours ago.

Management Software: this includes a variety of software for managing and administering various devices: simple configuration programs (cofiguration wizards), centralized monitoring programs: topology mapping, real-time monitoring, crash reporting mechanisms. It also includes Business Guarantee programs: multidimensional performance statistics, performance reports and queries, and more.

Disaster Recovery (DR)... This is a fairly important component of serious industrial storage systems, although it is quite costly. But these costs must be borne so as not to lose overnight "that which was acquired by overwork." The above data protection systems (Snapshot, Remote Replication, CDP) are good as long as there is no natural disaster in the settlement where the storage system is located: tsunami, flood, earthquake or (pah-pah-pah) - nuclear war. And any war can also greatly spoil the lives of people who are engaged in useful things, for example, storing data, and not running around with a machine gun in order to chop off other people's territories or punish some "infidels". Remote replication assumes that the replicating storage system is located in the same city, or at least nearby. That, for example, does not help in the event of a tsunami.

Disaster Recovery technology assumes that the backup center used for data recovery in case of natural disasters is located at a considerable distance from the main data center, and interacts with it via a data transmission network superimposed on a transport network, most often an optical one. With such an arrangement of the main and backup data centers, for example, it will be simply impossible to use CDP technology.

DR technology uses three fundamental concepts:

BW (Backup Window)- "backup window", the time required for the backup system in order to copy the received data volume of the working system.
RPO (Recovery Point Objective)- "Valid recovery point", the maximum period of time and the corresponding amount of data that is acceptable to lose for the storage user.
RTO (Recovery Time Objective)- "acceptable time of unavailability", the maximum time during which the storage system can be unavailable, without a critical impact on the core business.

Rice. 11. Three fundamental concepts of DR technology.

* * *

This essay does not claim to be complete and only explains the basic principles of storage, although not in full. Various sources on the Internet contain many documents that describe in more detail all the points set forth (and not set forth) here.

Continuing the topic of storage about object storage systems -.

The dependence of the business processes of the enterprise on the IT sphere is constantly growing. Today, not only large companies, but also representatives of medium and often small businesses pay attention to the issue of the continuity of IT services.

One of the central elements of ensuring fault tolerance is a data storage system (DSS) - a device on which all information is centrally stored. The storage system is characterized by high scalability, fault tolerance, the ability to perform all service operations without interrupting the operation of the device (including the replacement of components). But the cost of even a basic model is measured in tens of thousands of dollars. For instance, Fujitsu ETERNUS DX100 with 12 discs Nearline SAS 1Tb SFF (RAID10 6TB) worth the order USD 21,000, which is very expensive for a small company.

In our article, we propose to consider options for organizing budget storage, which does not lose in performance and reliability to classic systems. To implement it, we propose to use CEPH.

What is CEPH and how does it work?

CEPH- storage based on free software, is a combination of disk spaces of several servers (the number of servers in practice is measured in tens and hundreds). CEPH allows you to create highly scalable storage with high performance and resource redundancy. CEPH can be used both as object storage (to store files) and as a block device (to serve up virtual hard disks).

Storage fault tolerance is ensured by replicating each data block to multiple servers. The number of simultaneously stored copies of each block is called the replication factor, by default its value is 2. The storage operation scheme is shown in Figure 1, as we see the information is divided into blocks, each of which is distributed over two different nodes.

Figure 1 - Distribution of data blocks

If the servers do not use fault-tolerant disk arrays, we recommend using a higher replication factor for reliable data storage. In the event of a failure of one of the servers, CEPH fixes the inaccessibility of the data blocks (Figure 2), which are located on it, waits for a certain time (the parameter is configured, by default 300 seconds), after which it begins to recreate the missing information blocks in another place (Figure 3 ).

Figure 2 - Failure of one node

Figure 3 - Redundancy restoration

Similarly, if a new server is added to the cluster, the storage is rebalanced in order to uniformly fill the disks on all nodes. The mechanism that controls the distribution of blocks of information in the CEPH cluster is called CRUSH.

To obtain high disk space performance in CEPH clusters, it is recommended to use the cache tiering functionality. Its meaning is to create a separate high-performance pool and use it for caching, while the main information will be placed on cheaper disks (Figure 4).

Figure 4 - Logical view of disk pools

Tiered caching will work as follows: client write requests will be written to the fastest pool and then moved to the storage tier. Likewise, for read requests - information when accessed will be raised to the caching level and processed. Data continues to remain at the cache level until it becomes inactive or until it becomes irrelevant (Figure 5). It is worth noting that caching can be configured for read-only, in which case write requests will be written directly to the storage pool.

Figure 5 - The principle of operation of cash-thyrring

Let's consider real-life scenarios of using CEPH in an organization to create a data warehouse. Small and medium-sized businesses are considered as a potential client, where this technology will be most in demand. We calculated 3 scenarios for using the described solution:

A manufacturing or trading enterprise with a requirement for the availability of an internal ERP system and file storage 99.98% per year, 24/7.
An organization that needs to deploy an on-premises private cloud for its business needs.
A very budget solution for the organization of fault-tolerant block data storage, completely independent of hardware with 99.98% availability per year and inexpensive scalability.

Use case 1: CEPH-based data warehouse

Let's look at a real-world example of CEPH being applied in an organization. For example, we need 6 TB fault-tolerant high-performance storage, but the costs even for a basic storage model with disks are on the order of $21 000 .

Putting together a repository based on CEPH. We propose to use the solution as servers Supermicro twin(Figure 6). The product represents 4 server platforms in a single 2-unit-high case, all the main nodes of the device are duplicated, which ensures its continuous operation. To implement our task, it will be enough to use 3 nodes, the 4th will be in reserve for the future.

Figure 6 - Supermicro Twin

We complete each of the nodes as follows: 32 GB of RAM, a 4-core processor 2.5 GHz, 4 SATA disks of 2 TB each for the storage pool are combined into 2 RAID1 arrays, 2 SSD disks for the caching pool are also combined into RAID1. The cost of the entire project is shown in Table 1.

Table 1. Components for storage based on CEPH

Components	Price, USD	Qty	Cost, USD
	4 999,28	1	4 999,28
	139,28	6	835,68
Processor Ivy Bridge-EP 4-Core 2.5GHz (LGA2011, 10MB, 80W, 22nm) Tray	366,00	3	1 098,00
	416,00	12	4 992,00
	641,00	6	3 846,00
TOTAL			15 770,96

Conclusion: As a result of building the storage, we get a 6Tb disk array with costs of the order of $16 000 , what 25% less than the purchase of a minimum storage system, while at current capacities you can run virtual machines that work with storage, thereby saving on the purchase of additional servers. In fact, this is a complete solution.

The servers from which the storage is built can be used not only as storage for hard drives, but also as storage for virtual machines or application servers.

Use case 2: Build a private cloud

The challenge is to deploy the infrastructure to build a private cloud at a minimal cost.

Building even a small cloud consisting of, for example, 3 carriers in about $36 000 : $ 21,000 - the cost of storage + $ 5000 for each server with 50% content.

Using CEPH as storage allows you to combine computing and disk resources on the same hardware. That is, you do not need to purchase storage systems separately - disks installed directly into the servers will be used to host virtual machines.

Quick reference:
The classic cloud structure is a cluster of virtual machines, the functioning of which is provided by 2 main hardware components:

Computing part (compute) - servers filled with RAM and processors, the resources of which are used by virtual machines for computing
Storage system (storage) - a device filled with hard drives, which stores all the data.

We take the same Supermicro servers as equipment, but install more powerful processors - 8-core with a frequency of 2.6 GHz, as well as 96 GB of RAM in each node, since the system will be used not only for storing information, but also for the operation of virtual machines. We take a set of disks similar to the first scenario.

Table 2. CEPH Private Cloud Hardware

Components	Price, USD	Qty	Cost, USD
Supermicro Twin 2027PR-HTR: 4 hot-pluggable systems (nodes) in a 2U form factor. Dual socket R (LGA 2011), Up to 512GB ECC RDIMM, Integrated IPMI 2.0 with KVM and Dedicated LAN. 6x 2.5 "Hot-swap SATA HDD Bays. 2000W Redundant Power Supplies	4 999,28	1	4 999,28
Memory module Samsung DDR3 16GB Registered ECC 1866Mhz 1.5V, Dual rank	139,28	18	2 507,04
Intel Xeon processor E5-2650V2 Ivy Bridge-EP 8-Core 2.6GHz (LGA2011, 20MB, 95W, 32nm) Tray	1 416,18	3	4 248,54
Hard Drive SATA 2TB 2.5 "Enterprise Capacity SATA 6Gb / s 7200rpm 128Mb 512E	416	12	4 992,00
Solid State Drive SSD 2.5 "" 400GB DC S3710 Series.	641	6	3 846,00
TOTAL			20 592,86

The collected cloud will have the following resources, taking into account the preservation of stability when the 1st node fails:

RAM: 120 GB
Disk space 6000 GB
Physical processor cores: 16 pcs.

The assembled cluster will be able to support about 10 medium virtual machines with the following characteristics: 12 GB RAM / 4 processor cores / 400 GB disk space.

It is also worth considering that all 3 servers are only 50% full and, if necessary, they can be replenished, thereby doubling the pool of resources for the cloud.

Conclusion: As you can see, we got both a full-fledged failover cluster of virtual machines and redundant data storage - failure of any of the servers is not critical - the system will continue to function without stopping, while the cost of the solution is about 1.5 times lower than to buy storage systems and separate servers.

Use case 3: Building a super-cheap data warehouse

If the budget is completely limited and there is no money for the purchase of the equipment described above, you can buy used servers, but you should not save on disks - it is strongly recommended to buy new ones.

We propose to consider the following structure: purchased 4 server nodes, each server has 1 SSD-drive for caching and 3 SATA drives... Supermicro servers with 48GB of RAM and 5600-series processors can now be purchased for about $800 .

Disks will not be assembled into fault-tolerant arrays on each server, but will be presented as a separate device. In this regard, to improve the reliability of the storage, we will use the replication factor 3. That is, each block will have 3 copies. With this architecture, SSD cache disk mirroring is not required, since information is automatically duplicated to other nodes.

Table 3. Accessories for storage

Conclusion: If necessary, this solution can use larger disks, or replace them with SAS, if you need to get maximum performance for the operation of the DBMS. In this example, the result is 8 TB of storage with very low cost and very high availability. The price of one terabyte turned out 3.8 times cheaper than when using industrial storage for $ 21,000.

Summary table, conclusions

Configuration	Storage Fujitsu ETERNUS DX100 + 12 Nearline SAS 1Tb SFF (RAID10)	Storage Fujitsu ETERNUS DX100 + 12 Nearline SAS 1Tb SFF (RAID10) + Supermicro Twin	Our scenario 1: CEPH-based storage	Our scenario 2: building a private cloud	Our scenario 3: building ultra-cheap storage
Useful volume, GB	6 000	6 000	6 000	6000	8 000
Price, USD	21000	36000	15 770	20 592	7 324
Cost of 1 GB, USD	3,5	6	2,63	3,43	0,92
Number of IOPs * (Read 70% / Write 30%, 4K Block Size)	760	760	700	700	675
Appointment	Storage	Storage + Computing	Storage + Computing	Storage + Computing	Storage + Computing

* The calculation of the number of IOPs was performed for the created arrays of NL SAS disks on storage systems and SATA disks on CEPH storage, caching was disabled for the purity of the obtained values. With caching, IOPs will be significantly higher until the cache is full.

As a result, we can say that reliable and cheap data warehouses can be built on the basis of the CEPH cluster. As calculations have shown, using cluster nodes only for storage is not very effective - the solution is cheaper than purchasing storage systems, but not by much - in our example, the cost of storage on CEPH was about 25% less than Fujitsu DX100. The real savings are felt as a result of combining the computing part and storage on the same hardware - in this case, the cost of the solution will be 1.8 times less than when building a classical structure using dedicated storage and separate host machines.

EFSOL implements this solution according to individual requirements. We can use the equipment you have, which will further reduce the capital costs of system implementation. Contact us and we will conduct a survey of your equipment for its use in creating storage systems.

As you know, in recent years there has been an intensive increase in the volume of accumulated information and data. Research by IDC Digital Universe has shown that the world's digital content is likely to grow from 4.4 zettebytes to 44 zettebytes by 2020. According to experts, the volume of digital information doubles every two years. Therefore, today the problem of not only information processing, but also its storage is extremely urgent.

To address this issue, there is currently a very active development of such a direction as the development of storage systems (networks / storage systems). Let's try to figure out what exactly the modern IT industry means by the concept of "data storage system".

Storage is a software and hardware integrated solution aimed at organizing reliable and high-quality storage of various information resources, as well as providing uninterrupted access to these resources.

The creation of such a complex should help in solving a variety of problems facing modern business in the course of building an integral information system.

The main components of the storage system:

Storage devices (tape library, internal or external disk array);

Monitoring and control system;

Data backup / archiving subsystem;

Storage management software;

Infrastructure for accessing all storage devices.

Main tasks

Let's consider the most typical tasks:

Decentralization of information. Some organizations have a developed branch structure. Each separate unit of such an organization should have free access to all the information it needs to work. Modern storage systems interact with users who are located at a great distance from the center where data processing is performed, therefore they are able to solve this problem.

Failure to foresee the final required resources. When planning a project, it can be extremely difficult to determine exactly what amount of information you will have to work with during system operation. In addition, the mass of accumulated data is constantly increasing. Most modern storage systems have support for scalability (the ability to increase their performance after adding resources), so the capacity of the system can be increased in proportion to the increase in loads (upgrade).

Security of all stored information. It can be quite difficult to control as well as restrict access to information resources of an enterprise. Unskilled actions of service personnel and users, deliberate attempts to sabotage - all this can cause significant harm to the stored data. Modern storage systems use various fault tolerance schemes to resist both deliberate sabotage and inept actions of unskilled employees, thereby preserving the system's operability.

The complexity of managing distributed information flows - any action aimed at changing distributed information data in one of the branches inevitably creates a number of problems - from the complexity of synchronizing different databases and versions of developers' files to unnecessary duplication of information. The management software products that ship with storage can help you optimize the complexity and efficiency of your stored information.

High costs. Data storage costs account for about twenty-three percent of all IT spending, according to a study conducted by IDC Perspectives. These costs include the cost of software and hardware parts of the complex, payments to service personnel, etc. Using storage systems allows you to save on system administration, and also provides a decrease in personnel costs.

The main types of storage systems

All data storage systems are divided into 2 types: tape and disk storage systems. Each of the two above-mentioned species is divided, in turn, into several subspecies.

Disk storage systems

Such data storage systems are used to create backup intermediate copies, as well as operational work with various data.

Disk storage systems are divided into the following subspecies:

Backup devices (various disk libraries);

Work data devices (high performance equipment);

Devices used for long-term storage of archives.

Tape storage

Used to create archives as well as backups.

Tape storage systems are divided into the following subspecies:

Tape libraries (two or more drives, many tape slots);

Autoloaders (1 drive, multiple tape slots);

Separate drives.

Main connection interfaces

Above we examined the main types of systems, and now let's take a closer look at the structure of the storage systems themselves. Modern storage systems are categorized according to the type of host interface they use. Consider below the 2 most common external connection interfaces - SCSI and FibreChannel. The SCSI interface resembles the widely used IDE and is a parallel interface that can accommodate sixteen devices on one bus (for IDE, as you know, two devices per channel). The maximum speed of the SCSI protocol today is 320 megabytes per second (a version that will provide speeds of 640 megabytes per second is currently in development). The disadvantages of SCSI are as follows - inconvenient, not having noise immunity, too thick cables, the maximum length of which does not exceed twenty-five meters. The SCSI protocol itself also imposes certain restrictions - as a rule, this is 1 initiator on the bus plus slave devices (tape drives, disks, etc.).

FibreChannel is less commonly used than SCSI because the hardware used for this interface is more expensive. In addition, FibreChannel is used to deploy large SAN storage networks, so it is used only in large companies. Distances can be practically any - from standard three hundred meters for standard equipment to two thousand kilometers for powerful switches ("directors"). The main advantage of the FibreChannel interface is the ability to combine multiple storage devices and hosts (servers) into a common SAN storage area network. Less important advantages are: greater distances than with SCSI, the possibility of link aggregation and redundancy of access paths, the ability to "hot-plug" equipment, and higher noise immunity. Two-core single and multimode optical cables (with SC or LC connectors) are used, as well as SFP - optical transmitters made on the basis of laser or LED emitters (these components determine the maximum distance between the devices used, as well as the transmission speed).

Storage topology options

Traditionally, storage systems are used to connect servers to DAS - a data storage system. In addition to DAS, there are also NAS - storage devices that connect to the network, as well as SAN - components of storage networks. SAN and NAS systems were created as an alternative to the DAS architecture. At the same time, each of the above solutions was developed in response to the constantly increasing requirements for modern data storage systems and was based on the use of technologies available at that time.

The first networked storage architectures were developed in the 1990s to address some of the more tangible shortcomings of DAS systems. Storage networking solutions were designed to address the above objectives: reduce the cost and complexity of data management, reduce LAN traffic, and improve overall performance and data availability. That being said, SAN and NAS architectures address different aspects of one common problem. As a result, 2 network architectures began to exist simultaneously. Each of them has its own functionality and benefits.

DAS

(D irect A ttached S torage) Is an architectural solution used in cases where a device used to store digital data is connected via the SAS protocol via an interface directly to a server or to a workstation.

The main advantages of DAS systems: low cost compared to other storage solutions, ease of deployment and administration, high-speed data exchange between the server and the storage system.

The above advantages allowed DAS systems to become extremely popular in the segment of small corporate networks, hosting providers and small offices. But at the same time, DAS systems also have their drawbacks, for example, not optimal utilization of resources, explained by the fact that each DAS system requires a dedicated server connection, in addition, each such system allows connecting no more than two servers to a disk shelf in a certain configuration.

Advantages:

Affordable cost. The storage system is essentially a disk basket installed outside the server, equipped with hard drives.

Providing high-speed exchange between the server and the disk array.

Flaws:

Insufficient reliability - in the event of an accident or any problems in the network, the server ceases to be available to a number of users.

High latency due to the fact that all requests are processed by one server.

Poor manageability - The availability of all capacity to a single server reduces the flexibility of data distribution.

Low resource utilization - the amount of data required is difficult to predict: some DAS devices in an organization may experience excess capacity, while others may not have enough capacity, since reallocation of capacity is usually too time-consuming or even impossible.

NAS

(N etwork A ttached S torage) Is an integrated stand-alone disk system that includes a NAS server with its own specialized operating system and a set of user-friendly functions that provide quick system startup and access to any files. The system is connected to an ordinary computer network, allowing users of this network to solve the problem of lack of free disk space.

NAS is a storage that connects to the network like a regular network device, providing file access to digital data. Any NAS device is a combination of the storage system and the server to which the system is connected. The simplest NAS device is a network server that provides file shares.

NAS devices consist of a head unit that performs data processing, and also connects a chain of disks into a single network. NAS enables storage over Ethernet networks. Shared access to files is organized in them using the TCP / IP protocol. Such devices enable file sharing even among clients with systems running different operating systems. Unlike DAS architecture, NAS systems do not have to take servers offline to increase overall capacity; adding disks to the NAS structure can be done simply by connecting the device to the network.

NAS technology is developing today as an alternative to universal servers that carry a large number of different functions (e-mail, fax server, applications, printing, etc.). NAS devices, unlike universal servers, perform only one function - a file server, trying to do this as quickly, simply and efficiently as possible.

Connecting the NAS to a LAN provides access to digital information for an unlimited number of heterogeneous clients (that is, clients with different operating systems) or other servers. Almost all NAS devices today are used on Ethernet networks based on TCP / IP protocols. Access to NAS devices is carried out using special access protocols. The most common file access protocols are DAFS, NFS, CIFS. Specialized operating systems are installed inside such servers.

A NAS device can look like an ordinary box with one Ethernet port and a couple of hard drives, or it can be a huge system equipped with several dedicated servers, a huge number of drives, and external Ethernet ports. Sometimes NAS devices are part of a SAN network. In this case, they do not have their own drives, but only provide access to the data that is located on block devices. In this case, the NAS acts as a powerful dedicated server, and the SAN acts as a storage device. In this case, a single DAS topology is formed from SAN and NAS components.

Advantages

Low cost, availability of resources for individual servers, as well as for any computer in the organization.

Versatility (one server is capable of serving Unix, Novell, MS, Mac clients).

Ease of deployment as well as administration.

Ease of sharing resources.

Flaws

Accessing information via network file system protocols is often slower than accessing a local disk.

Most affordable NAS servers fail to provide the flexible, high-speed access that modern SAN systems provide (block-level, not file-level).

SAN

(S torage A rea N etwork)- this architectural solution allows you to connect external storage devices (tape libraries, disk arrays, optical drives, etc.) to servers. With this connection, external devices are recognized by the operating system as local. Using a SAN network allows you to reduce the total cost of maintaining a data storage system and allows modern organizations to organize reliable storage of their information.

The simplest SAN option is storage systems, servers and switches, united by optical communication channels. In addition to disk storage systems, disk libraries, tape drives (tape libraries), devices used to store information on optical disks, etc. can be connected to the SAN.

Advantages

Reliability of access to the data that is located on external systems.

The independence of the SAN topology from the servers and storage systems used.

Security and reliability of centralized data storage.

Convenience of centralized data and switching management.

Ability to move I / O traffic to a separate network to offload LAN.

Low latency and high performance.

SAN logical structure flexibility and scalability.

The actual unlimited geographic size of the SAN.

The ability to quickly distribute resources between servers.

The simplicity of the backup scheme, ensured by the fact that all data is located in one place.

Ability to create failover clustering solutions based on an existing SAN at no additional cost.

Availability of additional services and capabilities, such as remote replication, snapshots, etc.

High security SAN /

The only drawback of such solutions is their high cost. In general, the domestic market for data storage systems lags behind the market of developed Western countries, which is characterized by widespread use of storage systems. The high cost and shortage of high-speed communication channels are the main reasons hindering the development of the Russian storage market.

RAID

Speaking of data storage systems, you should definitely consider one of the main technologies that underlie the operation of such systems and are ubiquitous in the modern IT industry. We mean RAID arrays.

A RAID array consists of several disks that are controlled by a controller and are interconnected through high-speed data transmission channels. The external system perceives such disks (storage devices) as a single whole. The type of array used has a direct impact on the degree of performance and fault tolerance. RAID arrays are used to increase the reliability of data storage as well as to improve the read / write speed.

There are several RAID levels used when building storage area networks. The most commonly used levels are:

1. This is a disk array of increased performance, without fault tolerance, with striping.
The information is split into separate data blocks. It is recorded simultaneously on two or more discs.

Pros:

The amount of memory is summed up.

Significant increase in performance (the number of disks directly affects the rate of increase in performance).

Minuses:

The reliability of RAID 0 is inferior to the reliability of even the most unreliable disk, because if any of the disks fail, the entire array becomes unusable.

2. - disk mirrored array. This array consists of a pair of disks that completely copy each other.

Pros:

Providing an acceptable write speed when parallelizing queries, as well as gain in read speed.

Ensuring high reliability - a disk array of this type functions as long as at least 1 disk is working in it. The probability of breaking 2 disks at the same time, equal to the product of the probabilities of breaking each of them, is much lower than the probability of breaking one disk. In the event of a single disk failure, in practice, immediate action must be taken to restore redundancy again. For this, it is recommended to use hot spares with RAID of any level (except for zero).

Minuses:

The only drawback of RAID 1 is that the user gets one hard drive for the price of two drives.

3.. This is a RAID 0 array built from RAID 1 arrays.

4. RAID 2... Used for arrays using Hamming code.

Arrays of this type are based on the use of Hamming code. Discs are divided into 2 groups: for data and also for codes used for error correction. Data on the disks used for storing information is distributed in the same way as in RAID 0, that is, it is divided into small blocks in accordance with the number of disks. The remaining drives store all error correction codes that help restore information in case one of the hard drives fails. The Hamming method used in ECC memory makes it possible to correct single errors on the fly, as well as detect double errors.

RAID 3, RAID 4... These are disk arrays with striping, as well as a dedicated parity disk. In RAID 3, data from n disks is split into components smaller than a sector (into blocks or bytes), and then it is distributed across n-1 disks. Parity blocks are stored on one disk. In a RAID 2 array, n-1 disks were used for this purpose, however, most of the information on the control disks was used to correct errors on the fly, while most users in the event of a disk breakdown, simple information recovery is sufficient (for this, there is enough information that fits on one hard disk ).

A RAID 4 array is similar to RAID 3, however, the data is not divided into separate bytes, but into blocks. This partly made it possible to solve the problem of insufficiently high data transfer rate with a small volume. Writing is too slow due to the fact that the write generates parity for the block, writing to a single disc.
Unlike RAID 2, RAID 3 is distinguished by its inability to correct errors on the fly, and also by less redundancy.

Pros:

Cloud providers are also actively purchasing for their storage needs, for example, Facebook and Google are building their own servers from ready-made components, but these servers are not counted in the IDC report.

IDC also expects that emerging markets will soon outperform developed markets in storage consumption as they experience faster economic growth. For example, the region of Eastern and Central Europe, Africa and the Middle East will surpass Japan in terms of storage costs in 2014. By 2015, the Asia-Pacific region, excluding Japan, will surpass Western Europe in terms of storage consumption.

The sale of data storage systems carried out by our company Navigator gives everyone the opportunity to get a reliable and durable basis for storing their multimedia data. A wide selection of Raid arrays, network storages and other systems makes it possible to individually select RAID from the second to the fourth for each order; it is impossible to carry out parallel write operations, due to the fact that a separate control disk is used to store digital parity information. RAID 5 lacks the aforementioned disadvantage. Checksums and data blocks are written automatically to all disks; there is no asymmetric disk configuration. Checksums mean the result of the XOR operation. XOR makes it possible to replace any operand with the result and, using the XOR algorithm, get the missing operand as a result. To save the XOR result, only one disk is needed (its size is identical to the size of any disk in raid).

Pros:

The popularity of RAID5 is primarily due to its cost effectiveness. Writing to a RAID5 volume takes up additional resources, resulting in performance degradation as additional computations and writes are required. But on the other hand, when reading (in comparison with a separate hard disk), there is a certain gain, which consists in the fact that data streams coming from several disks can be processed in parallel.

Minuses:

RAID 5 has much lower performance, especially when performing random writes (such as Random Write), in which performance is reduced by 10-25 percent of the performance of RAID 10 or RAID 0. This is because this process requires more disk operations (each server write operation on the RAID controller is replaced by 3 operations - 1 read operation and 2 write operations). The disadvantages of RAID 5 appear when one disk fails - in this case, the entire volume goes into critical mode, all read and write operations are accompanied by additional manipulations, which leads to a sharp drop in performance. In this case, the reliability level drops to the reliability level of RAID 0 equipped with the corresponding number of disks, becoming n times less than the reliability of a single disk. If, before the array is restored, at least one more disk fails or an unrecoverable error occurs on it, the array will be destroyed, and the data on it cannot be restored using conventional methods. Also note that the RAID data redundancy rebuild process, known as RAID Reconstruction, after a disk fails will cause an intense continuous read load from all disks that will persist for many hours. As a result, one of the remaining disks may fail. Also, previously undetected failures of reading data in cold data arrays (those data that are not accessed during normal operation of the array - inactive and archived) may also be revealed, which leads to an increased risk of failure during data recovery.

6. is a RAID 50 array, which is built from RAID5 arrays;

7. - striped disk array, which uses 2 checksums, calculated in 2 independent ways.

RAID 6 is in many ways similar to RAID 5, but it differs from it in a higher degree of reliability: it allocates the capacity of two disks for checksums, and two sums are calculated using different algorithms. Higher capacity RAID controller required. Helps protect against multiple failures by ensuring uptime after two drives fail simultaneously. Arranging the array requires a minimum of four drives. Using RAID-6 typically results in about 10-15 percent degradation in disk group performance. This is due to the large amount of information that the controller has to process (it becomes necessary to calculate the second checksum, as well as read and write more disk blocks during the write process of each of the blocks).

8. is a RAID 0 array that is built from RAID6 arrays.

9. Hybrid RAID... This is another level of RAID array that has become quite popular lately. These are the usual RAID levels used in conjunction with optional software, and SSDs used as read cache. This leads to an increase in system performance, due to the fact that SSDs, in comparison with HDDs, have much better speed characteristics. Today there are several implementations, for example, Crucial Adrenaline, as well as several budget Adaptec controllers. Currently, the use of Hybrid RAID is not recommended due to the small resource of SSD disks.

Hybrid RAID reads from the faster SSD, and writes to both SSDs and hard drives (for redundancy purposes).
Hybrid RAID is great for low-level data applications (virtual machine, file server, or Internet gateway).

Features of the modern storage market

In the summer of 2013, the analytical company IDC released its next forecast for the storage market, calculated by it until 2017. Analysts' calculations show that in the next four years, global enterprises will purchase storage systems with a total capacity of one hundred and thirty-eight exabytes. The total realizable storage capacity will grow by about thirty percent annually.

However, compared to previous years, when there was a rapid growth in data storage consumption, the rate of this growth will slow down somewhat, as today most companies use cloud solutions, giving preference to technologies that optimize data storage. Storage space savings are achieved through tools such as virtualization, data compression, data deduplication, and more. All of the above tools provide space savings, allowing companies to avoid spontaneous purchases and resort to purchasing new storage systems only when they are really needed.

Of the 138 exabytes expected to be sold in 2017, 102 exabytes will be external storage and 36 internal storage. In 2012, twenty exabytes of storage for external systems and eight for internal systems were implemented. The financial costs of industrial storage systems will increase annually by approximately 4.1 percent and by 2017 will amount to about forty two and a half billion dollars.

We have already noted that the global storage market, which has recently experienced a real boom, has gradually declined. In 2005, the growth in storage consumption at the industrial level was sixty-five percent, and in 2006 and 2007 - fifty-nine percent each. In subsequent years, the growth in storage consumption decreased even more due to the negative impact of the global economic crisis.

Analysts predict that increased use of cloud storage will lead to less consumption of storage solutions at the enterprise level. Cloud providers are also actively purchasing for their storage needs, for example, Facebook and Google are building their own servers from ready-made components, but these servers are not counted in the IDC report.

Prompt sale of data storage systems

The broad technical ability, literacy and experience of the company's personnel guarantee a quick and comprehensive implementation of the task. At the same time, we are not limited exclusively to the sale of data storage systems, since we also carry out its configuration, start-up and subsequent service and maintenance.

How data storage systems work. Purpose of data storage systems (DSS) and their types

Our project requires a storage system with the following characteristics:

Asustor

Netgear

Qnap

Synology

Thecus

Options for organizing access to storage systems

SAS

NAS

SAN

FC as the basis for building a SAN

Table 1. Comparison of Fiber Channel and Parallel SCSI Technologies

SAN Impact on Applications

Data Backup and Recovery

Server Clustering

Concurrent video streaming, data sharing

Main SAN components

Wednesday

Connectors, adapters

Concentrators

Switches

Bridges

Servers and storage

Conclusion

P.S.

Data storage systems.

Basic storage connection protocols

Hard drives

RAID

Storage software

* * *

What is CEPH and how does it work?

Use case 1: CEPH-based data warehouse

Use case 2: Build a private cloud

Use case 3: Building a super-cheap data warehouse

Summary table, conclusions

The main components of the storage system:

Main tasks

The main types of storage systems

Main connection interfaces

Storage topology options

DAS

NAS

SAN

RAID

Features of the modern storage market

Prompt sale of data storage systems

Top related articles