Learning Objectives
By the end of this section, you will be able to:
- Discuss the various types of legacy database management systems
- Explain the functionality of non-first normal form (NFNF) database management systems
- Identify various characteristics of object databases and object persistence
- Differentiate the various relational database management system extensions
- Describe XML databases
- Summarize unstructured data and the advent of not only SQL (NoSQL) databases
- Differentiate cloud-related data management services and other types of databases
A nonrelational database is a database that does not use a traditional method for storing data such as rows and columns. Instead, nonrelational databases use a different storage model with specific requirements based on the type of data being stored. Relational databases typically store data in tables, with columns representing the attributes of the data and rows representing individual records. This structure is optimized for fast retrieval and manipulation of data using SQL queries. In contrast, nonrelational databases are optimized for storing and querying large amounts of data and are typically stored in a specialized format that allows for efficient indexing and querying. This format may include the use of columnar storage and compression techniques.
Legacy Databases
Flat file database and multifile relational database are the two main legacy DBMSs. A flat file database uses a simple structure to store data in a text file. Each line in the file is holding one record. A multifile relational database is more flexible than flat file structures as it provides more functionality for creating and updating data. It contains multiple tables of data with rows and columns that relate to each other.
Hierarchical Model
A hierarchical model is a model in which data are stored in the form of records and organized into a tree structure. The structure is parent-child and each parent may connect to one or more child nodes as in Figure 8.15. As an example, IBM IMS is a hierarchical database management software system for OLTP.
A hierarchical model has many limitations such as the complexity of navigating the system, its requirement that data be repetitively stored due to a treelike structure with a high level of redundancy, and its need for sequential searching.
Non-First Normal Form DBMSs
The database data model that does not meet any of the conditions of database normalization defined by the relational model is called a non-first normal form (NFNF). However, data in this form may require complex joins, which can make querying more complicated, although it does not lead to data duplication.
Object Databases and Object Persistence
An object-oriented database management system (OODBMS) offers unique object identifiers to access each object. The persistence is established with time and space, and the object exists even after its parent object has been deleted, which means it is persistent with time. An example of an OODBMS is db4o, which is an open-source, embeddable object database for Java and .NET environments. It allows developers to store and retrieve objects directly rather than having to map objects to a relational database schema.
Principles of Object Persistence
When an object is not deleted until a need emerges to remove it from memory, object persistence appears. In other words, the transient object is only needed during program execution and can be discarded when the program terminates. Persistence strategies include persistence by class, persistence by creation, persistence by marking, persistence by inheritance, and persistence by reachability.
In persistence orthogonality, the environment does not require any actions by a program to retrieve or save their state. In persistence independence, an object is independent of how a program manipulates it. There are many types of orthogonality. For example, all objects can be made persistent irrespective of their type or size, and they can achieve transitive persistence (refers to persistence by reachability). Persistent programming languages extend an OO language with a set of class libraries for object persistence. Serialization translates an object’s state into a format that can be stored (for example, in a file) and reconstructed later.
OODBMSs and ODBs
OODBMSs appeared around 1985 and originated as extensions of OO programming languages. Object-oriented programmers store the developed products as objects and modify existing objects to make new objects within the OODBMS. OODBMSs store persistent objects in a transparent way, support persistence orthogonality, and guarantee the ACID properties. OODBMS has many limitations such as lack of a universal data model, lack of experience compared to RDBMS, lack of support, and complexity. The object data management group (ODMG) was created in 1991 and is based on object management group (OMG) standard; its main idea is to create a set of specifications to develop applications for an object database. There have been five revisions of ODMG and the last version is ODMG 3.0. Most mainstream database applications are built using an OO programming language in combination with an RDBMS rather than using an object-relational mapping (ORM).
Industry Spotlight
MongoDB
MongoDB is the most popular NoSQL database; it has delivered substantial values for some businesses that have been struggling to handle their unstructured data with the traditional RDBMS approach. After MetLife spent years trying to build a centralized customer database on a RDBMS that could handle all its insurance products, someone at an internal hackathon built one with MongoDB within hours, which went to production in 90 days.
Extended Relational Databases
Extended relational databases (ERDBMSs) combine characteristics of RDBMS and OODBMS. The products of ERDBMSs provide a relational data model and query language that have been extended to include features of OODBMSs.
Limitations of the Relational Model
The relational model has a flat structure, and expensive joins are needed to defragment the data before it can be successfully used, which increases the complexity of the objects due to normalization. Specialization, categorization, and aggregation cannot be directly supported. A tuple constructor can only be used on atomic values while a set constructor can only be used on tuples; however, both constructors are not orthogonal, cannot model behavior or store functions, and both provide poor support for multimedia.
Active RDBMS Extensions
A trigger is a statement consisting of declarative and/or procedural instructions and is stored in the catalog of the RDBMS. Triggers can also reference attribute types in other tables. Triggers are automatically executed when a triggering event occurs such as any change in the database. Triggers are similar to procedures stored in the database but differ in that they need to be explicitly invoked. Triggers are easy to code, are useful in the validation process, allow calling other procedures from the trigger, and allow recursion.
Object-Relational RDBMS Extensions
Object-relational DBMSs (ORDBMSs) keep the relation as the fundamental building block and SQL as the core DDL/DML, but they use the following OO extensions: user-defined types (UDTs), user-defined functions (UDFs), inheritance, behavior, polymorphism, collection types, large objects (LOBs), and recursive SQL queries.
Technology in Everyday Life
Nonrelational Database Management System
You are told that a new type of data structure has been unveiled and that it characterizes new forms of data being collected from sensors located at the edge of the network.
Would you conclude that a new kind of nonrelational database management system needs to be developed? What would be your approach for selecting the best possible type of nonrelational database management system for this application?
XML Databases
Extensible Markup Language (XML) is a markup language similar to HTML, but users define their tags; predefined tags such as HTML are not used. The oldest schema language for XML is the document type definition (DTD). An XML Schema is the metadata that describes the structure of an XML document. Extensible Stylesheet Language (XSL) defines the features and syntax of XML and consists of a language for transforming XML documents (XSLT) and an XML vocabulary for specifying formatting (XSL-FO). An XML namespace is a collection of names that can be used as element names in an XML document. XML Path Language (XPath) uses path expressions to select nodes in an XML document. An XML database is a data persistence system whereby the data are specified and stored in XML format. It is easy to code but has no universally accepted rules for the XML database.
Differences between XML and Relational Data
While XML data are hierarchical, relational data are represented in a model of relationships. XML data are self-describing, but relational data are not. XML data have inherent ordering, but relational data do not. RDBMS only supports atomic data types such as integer, string, and date. XML DTDs do not support atomic data types. XML Schema supports both atomic and aggregated types (aggregated types modeled in object-relational databases using user-defined types). XML data are semistructured (can include certain anomalies, and a change to DTD or XSD necessitates regeneration of tables). For example, suppose a banking application needs to transfer money from one account to another. The transaction may involve multiple steps, such as deducting the amount from the sender's account and adding it to the recipient's account. If any of these steps fail, the entire transaction should be rolled back to ensure that the database remains consistent. Atomicity ensures that the transaction either completes successfully and all the changes are committed, or it fails and none of the changes are committed, leaving the database in a consistent state.
Mappings between XML and Object-Relational Data
An ontology is semantic data used to describe entities of the real world and the relationship between the entities using the Web Ontology Language (OWL) (Figure 8.16). To convert and map XML data into relational databases, we can use table-based mapping, which copies data from an external source such as XML into the selected table. A database schema is the structure of a database that defines how data are organized within a database. Schema-oblivious mapping stores both OWL class and instance data in a single table, which allows files to be easily parsed and loaded into the database. Schema-aware mapping maps the OWL schema into corresponding tables and then loads the instance data into the selected tables.
Global Issues in Technology
IT Solutions and Security Issues
IT solutions can help to address a range of security issues that organizations may face. Some common IT solutions for security issues include encryption and antivirus software.
Given the diversity of nonrelational database management systems and the fact that multiple systems may sometimes be used as part of the same solution, what are possible security issues that need to be considered?
Unstructured Data and NoSQL Databases
Unstructured data are data that are not arranged according to the data model and cannot be stored in a traditional relational database. NoSQL databases are schema-agnostic and provide the flexibility needed to store and manipulate large volumes of unstructured and semistructured data. Users do not need to know what types of data are stored during setup, and the system can accommodate changes in data types and schema. Designed to distribute data across different nodes, NoSQL databases are generally more horizontally scalable and fault-tolerant. MongoDB Atlas is an example of a NoSQL database system that is fully managed and operates on the cloud.
NoSQL DBMS is not ACID compliant and data consistency is not guaranteed. NoSQL DBMSs provide eventual consistency instead: when old data are getting overwritten, results that are a little wrong are temporarily returned. For example, Google’s search engine index cannot overwrite its data while people are simultaneously searching a given term so it does not give the most up-to-date results during a search, but it gives the best answer it can. While this setup does not work in situations where data consistency is necessary, such as financial transactions, it is just fine for tasks that require speed rather than pinpoint accuracy.
Key-Value Stores
A key-value store is a simple database that uses an associative array such as Redis, DynamoDB, and Cosmos DB. It stores only key-value pairs, provides basic functionality for retrieving the value associated with a known key, and works best with a simple database schema.
Tuple and Document Stores
A tuple and document store database stores data in XML or JSON format with the document name as key and the contents of the document as value.
Documents can contain many different value types and can be nested, making them particularly well-suited to manage semistructured data across distributed systems. For example, MongoDB and Couchbase are document-oriented databases that store data in flexible, schemaless documents, which allows for a high degree of flexibility in how data are stored and queried. However, MongoDB uses a flexible document model that allows for complex nested structures and hierarchical data, while Couchbase uses a key-value data model with JSON documents. This makes MongoDB more suited for complex data structures and Couchbase more suited for simpler data structures.
Column-Oriented Databases
A column-oriented database stores data in column families or tables and are built to manage petabytes of data across a massive distributed system (e.g., Cassandra and Hbase). Both Cassandra and HBase use a column-family data model to store and manage data and are both NoSQL databases that are designed to handle large volumes of data. However, Cassandra uses a partitioned row-store data model, while HBase uses a column-oriented data model. This means that Cassandra is optimized for high-speed reads and writes of individual rows, while HBase is optimized for scanning large tables of data.
Graph-Based Databases
A graph-based database represents data as a network of related nodes or objects to facilitate data visualizations and graph analytics. The graph may represent any relationship such as 1:1, 1:N, and N:M. It is useful for analyzing the relationships between heterogeneous data points, such as in fraud prevention or Facebook’s friend graph.
Other NoSQL Databases
There are many NoSQL databases such as XML databases, OO databases, database systems to deal with time series and streaming events, and database systems to store and query geospatial data (Spatial data refer to data that are associated with a specific location or geographic area. Spatial data can be managed and analyzed using a geographical information system.). There are also database systems such as BayesDB. BayesDB is a probabilistic programming platform that is based on a Bayesian approach to modeling data relationships. The structure of BayesDB can be broken down into several key components, including data tables and models that lets users query the probable implication of their data.
Transaction Management and Concurrency in NoSQL Databases
With the increasing amount of data, the number of parallel transactions has increased. In this case, the capacity can be increased by extending the system capabilities (vertical scaling) or arranging multiple servers in a cluster (horizontal scaling). NoSQL databases distribute data over a cluster of database nodes for the sake of performance and availability.
In many NoSQL implementations (e.g., Cassandra, Google’s BigTable, Amazon’s DynamoDB). all nodes implement the same functionality and are able to perform the role of request coordinator using a membership protocol. The membership protocol checks the availability of each node, which makes it easier to apply index between them but does not guarantee that every node is aware of every other node at all times. Hashing is the process of transforming a key to another value to implement a hash table. Consistent hashing avoids mapping each key to a new node in case nodes are added or deleted. Consistent hashing uses a modulo operator (%) to distribute keys over servers. The syntax is (H(key) = key % N) where N is the number of servers. Eventual consistency means the data and their replicas become consistent at some point in time after each transaction. NoSQL databases are categorized as eventual consistency because it guarantees up-to-date information; therefore, many NoSQL databases guarantee so-called eventual consistency based on the CAP theorem. The CAP theorem states that a distributed computer system cannot attain the following three properties concurrently: consistency (all nodes see the same data at the same time), availability (guarantees that every request receives a response, indicating a success or failure result), and partition tolerance (the system continues to work even if nodes go down or are added).
Think It Through
C++ and Databases
You are asked to develop a program in C++ that needs to store and retrieve data for later use.
Which type of nonrelational database management system would you recommend for this application?
Query Languages and APIs for NoSQL Databases
Processing an XML within a database application requires parsing the document and then processing the code using an API such as Document Object Model (DOM API). DOM API is a tree-based API (i.e., the memory representation is a tree). Figure 8.18 shows an example of XML code and the corresponding DOM tree.
NoSQL databases use filters to apply the simple query; that is, they use variables in the query that are populated on the server. Additionally, most NoSQL databases support complex queries using aggregation with MapReduce. MapReduce is an open-source software framework used to apply complex queries. It implements a map function, which is the conversion of a dataset into a tuple, and reduces function, which reduces the map into a smaller set.
Cloud and Other Data Management Services
Data as a Service (DaaS) is a data management technique that uses the cloud to store, process, and manage data. DaaS uses web services to handle data located on the Internet. Many vendors provide DaaS such as Azure, AWS, GCP, and IBM Cloud. Amazon Web Service (AWS) offers many solutions for DaaS, such as Aurora, RDS, and Redshift for relational database and DynamoDB for the key-value database.
Cloud Data as a Service (DaaS)
DaaS is a data management strategy that includes many technologies such as information life cycle solutions, data modeling, replication, and content management. DaaS is on-demand and subscription-based, which means the customer only pays for the services they need. The functionalities of DaaS are data quality, storing, managing, securing, and analyzing data. One of the key advantages of cloud-based database systems is their ability to be "elastic" and provide compute and storage resources on-demand. This means that as the size of a dataset grows or decreases as more or fewer users access the database, the system can quickly scale up or down its resources to handle the demand.
Blockchain DBMSs
The technology that records transitions securely is called blockchain. It has the world’s attention because of its use of cryptocurrencies such as Bitcoin. In health care, blockchain technology can be used to create a secure and transparent system for managing health-care data. This can help to improve patient privacy, reduce fraud, and improve the efficiency of health-care systems. A blockchain DBMS is a database that stores data as a block data structure, and each block is connected to other blocks by providing cryptographic security and immutability.
Link to Learning
One of the most popular NoSQL databases is the DB-Engines Ranking which ranks database management systems according to their popularity. The ranking is updated monthly. You can read about the method of calculating the scores.