WITH THE RISE OF CLOUD COMPUTING[1-2] AND SOFTWARE AS A SERVICE (SAAS)[3-5], CLOUD STORAGE HAS BECOME THE FOCUS OF ATTENTION IN INFORMATION STORAGE. IT, UNLIKE TRADITIONAL STORAGE, COMPRISES NOT ONLY HARDWARE DEVICES, BUT IS A SYSTEM OF NETWORK AND STORAGE EQUIPMENT, SERVER, APPLICATION SOFTWARE, PUBLIC ACCESS INTERFACE, ACCESS NETWORK, AND CLIENT PROGRAMS. SINCE ITS INTRODUCTION, CLOUD STORAGE HAS ATTRACTED GREAT INTEREST FROM SERVICE PROVIDERS.
A user’s local data can be stored in online spaces provided by a Storage Service Provider (SSP). A user need not build their own data centers but can apply for services from the SSP. In this way, repeated construction of storage platforms can be avoided and expensive investments in hardware and software infrastructure be saved.
1 Cloud Storage
Cloud storage differs from traditional storage in many aspects. In terms of functionality, it is designed to deliver many online storage services, whereas traditional storage systems are primarily designed for high performance computing and transaction processing. In terms of performance, cloud storage places great importance on data security, reliability, and efficiency. With a larger number of users, a wider service range, and a complex and ever-changing network environment, cloud storage systems face greater technical challenges than traditional systems when delivering high-quality services. In terms of data management, cloud storage systems not only offer access to traditional files such as Portable Operating System Interface for Unix (POSIX), but also support mass data management for providing public service support functions, and maintaining data in the background.
A cloud storage platform can be classified into four layers: data storage layer, data management layer, data service layer, and user access layer. Figure 1 shows the architecture of a cloud storage platform.
(1) DATA STORAGE LAYER
A CLOUD STORAGE SYSTEM OFFERS DIVERSE STORAGE SERVICES, AND ALL DATA STORED IN THE SYSTEM FORMS A MASSIVE POOL. FOR EFFICIENT STORAGE, THIS DATA SHOULD BE PROPERLY ORGANIZED. TRADITIONAL DATA ORGANIZATION USES A SINGLE SERVER AND CANNOT MEET THROUGHPUT AND STORAGE CAPACITY REQUIREMENTS OF MULTIPLE USERS IN A WIDE AREA NETWORK (WAN). A PEER-TO-PEER (P2P) ARCHITECTURE BASED ORGANIZATION METHOD REQUIRES A LARGE NUMBER OF NODES AND COMPLICATED CODING ALGORITHM TO ENSURE DATA RELIABILITY. IN CONTRAST, USING MULTIPLE STORAGE SERVERS TO ORGANIZE DATA BETTER SATISFIES THE REQUIREMENTS OF ONLINE STORAGE SERVICES. DISTRIBUTED DATA CENTERS CAN PROVIDE GOOD QUALITY OF SERVICE (QOS) FOR A LARGE NUMBER OF USERS IN DIFFERENT GEOGRAPHICAL REGIONS.
By interconnecting different types of storage devices, the data storage layer can manage massive amounts of data in a unified way, and can employ centralized management, status monitoring, and dynamic capacity expansion of storage devices. A cloud storage system is essentially a service-oriented distributed storage system.
(2) Data Management Layer
The data management layer provides the upper layer with a unified public management interface for different services. With functions such as user management, security management, replica management, and strategy management, this layer seamlessly associates upper-layer applications with lower-layer storage services. It also promotes cooperation between storage devices, enabling them to offer diverse and optimized services.
(3) Data Service Layer
The Data Service Layer Deals Directly With Users And Can Be Flexibly Expanded. Depending On User Demands, Different Application Interfaces Can Be Developed To Provide Services Such As Data Storage, Space Leasing, Public Resource, Multi-User Data Sharing, Or Data Backup.
(4) User Access Layer
In the user access layer, an authorized user can log into the cloud storage platform from any location via a standard public application interface and access cloud storage.
As an alternative to purchasing storage devices and deploying storage software, cloud storage has the following advantages:
(1) LOW COST AND QUICK RETURN
IN BUILDING A STORAGE PLATFORM THAT MEETS INFORMATION MANAGEMENT DEMANDS, PURCHASING STORAGE DEVICES AND DEPLOYING SOFTWARE REQUIRES HEAVY INITIAL INVESTMENT. SOFTWARE DEVELOPMENT OFTEN INVOLVES A LONG PROCESS OF FEASIBILITY ANALYSIS, REQUIREMENT ANALYSIS, SOFTWARE DESIGN, CODING, AND TESTING. BY THE TIME THE SOFTWARE IS DEVELOPED, THE DEMANDS MIGHT HAVE BEEN CHANGED SO THAT THE SOFTWARE HAS TO BE REDEVELOPED. THIS REDUCES THE QUALITY OF SERVICE (QOS), INCREASES COST, AND DELAYS THE PROGRESS OF INFORMATION MANAGEMENT. ENTERPRISES REPEATEDLY INVEST IN TRADITIONAL LOW-TECH STORAGE APPROACHES; AND FOR AN INDIVIDUAL ENTERPRISE, THIS MEANS CYCLIC, HIGH-COST TECHNICAL UPGRADES.
Taking The Cloud Storage Approach, Terminal Devices Need Only Be Configured To Receive Storage Services So That Heavy Investment In Platform Building Is Not Necessary. Storage Services Can Be Purchased According To The Number Of Users And The Usage Time Span, Thereby Avoiding The Risk Of Heavy Initial Investment And Reducing Usage Cost. Services Can Be Used Immediately And Conveniently.
(2) Ease of Management
Traditional storage systems require maintenance to be performed by dedicated IT staff, and this incurs additional cost. Maintenance and upgrade of a cloud storage system, however, is performed by the service provider, so that professional services are provided at the lowest cost.
Traditionally, once an investment has been made into purchasing devices or deploying software, the storage system cannot be dynamically adjusted during its lifetime. As devices are renewed, disposal of the existing outdated hardware platform becomes difficult. Ever-changing business needs may require the software to be constantly updated, upgraded, or even redeveloped. So high maintenance costs are incurred as a necessity and are to some extent beyond control. Cloud storage services are generally charged according to the number of users, usage time, and service items. On-demand services released by an enterprise can be changed anytime according to business needs, personnel changes, or financial status.
2 Application Of Cloud Backup
Cloud Storage Has A Variety Of Applications, Including Data Backup, Data Sharing, And Resource Service. It Can Also Provide Standardized Interfaces For Other Network Services. The Following Self-Developed B-Cloud System Is An Example Of Cloud Storage Technologies And Applications.
The deployment structure of B-Cloud is illustrated in Figure 2. It consists of three cloud levels.
The top level is called wide area (public) cloud. It covers all areas that backup clients can access via the WAN. Wide area cloud servers include wide area manager and wide area cloud storage nodes.
The middle level is called regional cloud and usually divided by geographic region (province or prefecture). Similar to wide area cloud, it has service nodes such as regional cloud manager and regional cloud storage servers.
The lowest level is called local (private) cloud. This is divided either by small geographical region or by an entity such as enterprise, institute, or campus. Local cloud can run on a WAN or Local Area Network (LAN), and its users are only those within the cloud. It has service nodes such as local manager and private cloud storage servers.
Like wide area cloud, regional and private clouds have multiple local storage nodes that serve multiple backup clients.
The topology of the B-Cloud system looks like a tree; the wide area cloud acts as the root node, and regional and local clouds act as branch nodes. Each node has its own manager and storage nodes that perform backup task scheduling and backup data access. All nodes—including wide area cloud, regional clouds, and local clouds—are physically connected. Nodes at any two adjacent levels have a parent-child relationship in which the child node can be viewed as a special user of the parent node. The topological structure is very scalable. Even though only three levels are currently defined in the system, any node can be split into more levels when the number of users grows or the service area expands.
When a new user (or backup client) registers in the system, it first visits the super director server of the system—which is responsible for global user management. The server then assigns the user a backup cloud node according to predefined assignment strategies and user information (such as the segment or region of the user’s IP address, organization of the user’s email address, or the user’s geographic location). User information is maintained by the system’s super director server. After registration, the backup client can log onto the system and communicate with the backup manager and storage nodes of the specific cloud to receive a service.
The principle of proximity access dictates that the nearer the client, the higher the data transmission efficiency and the lower the cost. An hierarchical topological structure creates an orderly relationship between multiple scheduling servers and multiple storage servers of the backup system, enabling the system to better serve backup clients in different regions.
The characteristics of cloud backup determine application requirements, which in turn drive the development of three key backup technologies: parallel task scheduling, data organization and compression, and backup security. These technologies are research subjects of the B-Cloud system, and involve several aspects of cloud backup service architecture. The relationship among them is illustrated in Figure 3.
Cloud backup differs from traditional backup software in the following ways:
(1) Number of Users
Backup software is traditionally used in a LAN or WAN by specific user groups. Because the number of users is small, only a small number of storage servers are usually configured—for ease of deployment, ease of maintenance, and to reduce cost. User access paths to the servers are fixed, and dynamic assignment or adjustment is not required to meet different scenarios.
In contrast, cloud backup is designed for a large number of users in the WAN. As this number grows, the system has to be configured with multiple storage servers to meet scalability requirements. The system should also be capable of processing concurrent access requests from a large number of users, and assigning proper target storage servers to these users by means of efficient parallel scheduling policy. Load balance and high storage utilization can thus be achieved among all storage servers. The process is completely transparent to users.
(2) Amount of Data
The difference in user numbers between cloud backup and traditional backup software represents a great difference in the amount of data to be processed. Backup data generated by large-scale users in the WAN may easily reach one TB or even one PB. Thus, proper data organization methods and compression algorithms are of great importance to improve the transmission and storage efficiency of massive amounts of data. The ultimate objective of these methods and algorithms is to improve system performance, reduce hardware costs and save energy.
(3) Service Security
Cloud backup must be compatible with heterogeneous data platforms of different backup clients; must ensure data integrity at the block, file and application levels; must adapt to a complex and changing WAN environment; and must guarantee data security.
Cloud backup systems have higher requirements on reliability than common backup software. However, people unconsciously feel it safer to backup critical data on visible devices. Doubts may arise about the security of backing up private data in faraway data centers.
On the one hand, cloud backup is prone to various abnormalities; on the other, users subjectively impose higher security requirements on cloud backup than on backup software. Security is therefore a pressing area of study for cloud backup.
According to the characteristics of cloud backup, the study of cloud backup focuses on:
(1) Command Flow
The B-Cloud system consists of backup clients, manager, and storage servers. The manager is the administrative center of the entire system, responsible for task scheduling, operation management, and status monitoring in the service process.
After receiving a service request from a backup client, the three parts of the system implement bidirectional security certification; and importantly, the manager completes job scheduling to establish a connection between the backup client and the storage server. At this point, the system begins to deliver the backup or recovery service.
(2) Data Flow
The backup or recovery data flow is transmitted between the backup client and the storage server without passing the manager. This method, whereby data goes directly from the source to the destination and does not pass any intermediate node, improves efficiency and also balances the overall load of the system. For data transmission and storage, backup data organization and compression are key.
(3) Service Security
Security of cloud backup involves the security within service platform, modules, and coordination and communication between modules. This issue requires in-depth discussion which is not intended here.
This paper introduces cloud storage technology and takes cloud backup as an example to discuss issues to be addressed in cloud storage application.
A trend of technology development is to satisfy requirements with services. Cloud storage conforms to this trend. However, in-depth study is needed into the implementation of cloud storage and its widespread applications.