Storage Technologies

From SAM
Jump to: navigation, search

Storage technology are describing the architecture and proceedings of storage media and thus forms a control level for data. It manages the storage of data and vice versa the retrieve of data. Actual common storages regarding to SAM have to deal with different types of data, which are semi-structured data, structured data, semantic data and binary data. In the following sections, some options for SAM are presented.

Introduction

Storage.jpg
In recent years storage media has changed fundamental and thus the storage technologies. The data memory space and data transfer rate became bigger and faster. The storage options increased and old storage media (e.g. Compact Disc, Digital Versatile Disc) decreased more and more. Today the focus lies on central storage technologies with big data volume and high data transfer rate. So the data is reachable from everywhere at every time. Actual common storage technologies are relational databases, NoSQL Databases, semantic databases, flat file systems and polyglot persistence.

Relevance to SAM

As the SAM platform will handle, gather and provide information it has to store information locally. Due to the different types of data (binary, semi-structured, structured and semantic) the SAM platform stores information using different databases which all will be accessible using the "Cloud Storage" component. This component will be implemented by the task T4.1 Assets Storage and Information Management located in work package 4 (WP4). The component and its service will be accessible to all components of the SAM platform for storing data.

State of the Art Analytics

Storage methods

Structured Database

Relational databases are designed to stored structured data. The data is stored in tables where each row represents a relation. A relation can consist of multiple columns which can contain different kinds of data. Relational databases can be queried by using SQL statements.[1]

Semi-Structured Database

NoSQL[2] databases are rather new compared to relational databases. They don't use tabular relations like RDBMS[3] but a document-oriented approach. Databases counting to the category of NoSQL are also called document-oriented databases. Document-oriented databases are designed for storing semi-structured data, which do not need specific schemas like relational databases. As relational databases are storing values in tables document-oriented databases are storing information as records which can be organized by tags, metadata or directory hierarchies. The documents can be encoded using standard formats like XML and JSON as well as binary forms.

Semantic Database

A semantic database can store triples, which is a data entity composed of the subject-predicate-object pattern like Alice-knows-Bob and Bob-likes-Jenny. Semantic databases can be queried using SPARQL statements.[4]

Binary Database (Flat File System)

A Flat File System is a system where data is stored in files using a single file system. One requirement is that the file name has to be unique because it acts like a unique id used by relational databases. This system can be used to save data like multimedia and other binary files.[5] Considering SAM a flat file system could be used to save digital data assets like trailer, videos or pictures.

Polyglot Persistence

Polyglot persistence allows storing different types of data using the best suited of different alternatives respectively databases. Of course polyglot persistence is only usable in an environment where user have the possibility to store in their data in databases consisting of different technologies, e.g. a so called cloud storage offering a relational and a binary storage.[6][7]

Tools, Frameworks and Services

Databases

The following sections will contain references to common database technologies for different data types.

Structured

Semi-Structured

Semantic

Binary

Technical Specification Decisions

In the SAM deliverable D3.3.1 Technical Specification different storage technologies have been compared. The result of this comparsion are the following decisions:

  • For storing semi-structured data MongoDB has been chosen
  • For storing structured data MySQL has been chosen
  • For storing semantic data Sesame has been chosen
  • For storing binary data Amazon S3 has been chosen

Related Projects

ADVENTURE

ADaptive Virtual ENTerprise ManufacTURing Environment[8] (ADVENTURE) is a STREP[9] funded by the European Seventh Framework Programme in Virtual Factories and Enterprises. Results of this project will foster the combining of different factories and their manufacturing processes in a pluggable way to create a specific product.

SIMPLI-CITY

SIMPLI-CITY[10] is a STREP[11] funded by the European Seventh Framework Programme. The goal of this project are to provide information by so-called full-fledged road user information systems, which then are used to make the drivers journeys safer, more comfortable, and more environmentally friendly.

SAM Approach

The Cloud Storage component is responsible for storing Assets and internal data used by other components of the SAM platform. This component provides multiple, general interfaces to access the different types of data storage systems and will provide a GUI, which enables the Platform administrator to manage the different data storage systems.

Architecture and Dependencies

The Cloud Storage component, as presented in the following image, provides several key features to SAM components for storing and retrieving data. More specifically:

  • The Cloud Storage offers persistent storage for all SAM components, which have access to different types of data storage systems, so the data can be saved in the best possible manner. Therefore the Cloud Storage is the central storage point of all data in the SAM platform.
  • The Cloud Storage supports different types of databases, namely semi-structured, semantic, binary and relational. These databases can be added to the Cloud Storage as required using the DB Management user interface.
  • The Cloud Storage also provides concrete implementations of API Wrapper for different target platforms in order to easily access functionality provided.


ArchitectureCloudStorage.png

Implementation and Technologies

After extended analysis and comparison the most appropriate technologies for the frontend and the backend have been selected.

Frontend Technologies

To enable the management of the different data storage systems and the therein stored data, a GUI will be created based on AngularJS and HTML5, which will be part of the so-called AdministrationTool website.

Backend Technologies

As base backend technology Java has been chosen. To keep the Cloud Storage modular, the implementation follows the OSGI specification.

Subcomponents

A summary of the tasks carried out for each subcomponent of the first version of the prototype is shown in the following table:

Subcomponent Task
Storage Facade Hosts RESTful interfaces for CRUD operations (MongoDB, Sesame, Amazon S3), create/delete Buckets, manage access rights for Buckets
Storage Nexus Controls the processes for data management as well as the management of the Cloud Storage itself
Storage Wrapper Implement Database Wrapper and Database Type Wrapper to manage the following databases/services: MongoDB, Sesame and Amazon S3
Storage Management Encapsulate the logic for all configurations of the Cloud Storage and connected external databases
External Databases Provide the databases instances for semantic, semi-structured and binary data

Functionality and UI Elements

The Cloud Storage provides CRUD operations on each data storage system. Additional it will provide interfaces, like the GetBucketList interface, to enable the implementation of the management GUI.

Management GUI

This section will be provided when the implementation is more advanced.

Latest Developments

The Cloud Storage component has been connected to the Identity and Security Services component (ISS), which is responsible for authentication and authorisation. Every call to the Cloud Storage will be checked upon authorisation based on the username and the bucket, which is being targeted.

References

  1. Wikipedia - http://en.wikipedia.org/wiki/Relational_database
  2. Wikipedia - http://en.wikipedia.org/wiki/NoSQL
  3. Wikipedia - http://en.wikipedia.org/wiki/Relational_database
  4. Wikipedia - http://en.wikipedia.org/wiki/Triplestore
  5. Wikipedia - http://en.wikipedia.org/wiki/File_system
  6. Martin Fowler - http://martinfowler.com/bliki/PolyglotPersistence.html
  7. Stephan Schmidt - http://codemonkeyism.com/nosql-polyglott-persistence/
  8. ADVENTURE project homepage http://www.fp7-adventure.eu/
  9. Small or Medium-Scale Focused Research Project
  10. SIMPLI-CITY project homepage http://http://simpli-city.eu/
  11. Small or Medium-Scale Focused Research Project