Learning Objectives
By the end of this section, you will be able to:
- Define important terms and characteristics of data management systems
- Explain the various aspects that characterize database management systems
Remember that data are any raw facts you can collect such as the number 48502. When we process the data, we have information that has meaning. For example, the number 48502 is a zip code. When we have a collection of related data, we call it a database. To store, retrieve, edit, and maintain the related data in the database we need a system called a database management system (DBMS).
Definition and Characteristics
Data, data model, database, and DBMS concepts are connected to each other. Data relate to known facts that can be recorded and have an implicit meaning. A data model is an abstract model that contains a set of concepts to describe the structure of a database, the operations for manipulating these structures, and certain constraints that the database should obey. A database is a collection of related data items within a specific business process or problem setting; an example of a large commercial database is the one maintained by amazon.com. A DBMS is the software package used to define, create, use, and maintain a database while considering appropriate security measures. A miniworld, or universe of discourse (UoD), represents some aspect of the real-world data that are stored in a database (e.g., student grades at a university). There are many characteristics of the database approach such as being superior to the file approach in terms of efficiency, consistency, and maintenance; providing loose coupling between applications and data as well as facilities for data querying and retrieval; removing redundancy and cleansing the data; supporting multiuser transactions that allow multiple users to access the same data and multiple views of the data; and allowing sharing data between users and applications.
Think It Through
University Miniworld
Let us think about the university as a miniworld. In this miniworld, we need to define a data dictionary to describe the following: students, courses, sections, departments, and instructors.
Suggest a miniworld that would be part of a university environment. Define the main entries to be included such as STUDENTs = ID + FirstName + LastName.
Applications of Database Technology
A database application is a program or piece of software designed to collect, store, access, retrieve, and manage information efficiently and securely. The following are some examples of database applications: multimedia applications (e.g., YouTube, Spotify), biometric applications (e.g., fingerprints, retina scans), wearable applications (e.g., Fitbit, Apple Watch), Geographic Information Systems (GIS) applications (e.g., Google Maps), sensor applications (e.g., nuclear power reactor), big data applications (e.g., Walmart), and IoT applications (e.g., Telematics).
Concepts In Practice
Specialized Database Systems
Database systems are fascinating because they must be given the ability to model various types of data at various levels of abstractions and various levels of details; therefore, it is possible to create specialized systems that can help people in various research domains and industries represent and use their data in the best possible way. For example, SciDB is an open-source data management system intended primarily for use in application areas that involve very large-scale array data. SciDB can be used to support a large variety of scientific applications used in astronomy, remote sensing and climate modeling, bioscience information management, risk management systems for financial applications, and web log data analysis.
Elements of Database Systems
The major elements of a database management system include hardware, software, data, procedures, language, and users. Hardware includes the physical devices such as computers and hard disks. Software refers to the set of programs that are used to manage and control a database. Data includes raw data and information organized and processed within the database. Procedures are instructions used to manage the database. A database user is a person who has the privileges to access, analyze, update, and maintain the data (e.g., information architect, database designer, database administrator, database application developer, and business user). A database language is used to write instructions to access and update data in the database. Table 8.1 shows different examples of database languages along with definitions, and examples.
Language | Definition | Relational DBMS (SQL) |
---|---|---|
database description language (DDL) | Used to create, update, and delete storage structures in a database management system | CREATE TABLE
table_name (
column1 datatype,
column2 datatype,
column3 datatype,
…
); |
data manipulation language (DML) | Used to create, update, and delete data in a database management system | UPDATE table_name
SET column1 = value1, column2 = value2, column3 = value3, …
WHERE condition; |
data query language (DQL) | Used to query the database | SELECT expressions
FROM table_name
WHERE conditions; |
data control language (DCL) | Used to control the use of features that are available in the database management system. | GRANT [privilege]
ON [object]
TO [user] |
Database Systems and Database Management
There are many advantages to using a DBMS such as controlling redundancy in data storage. When the same piece of data is held in two separate places in the database, data redundancy can occur. DBMS improves development and maintenance efforts, sharing of data among multiple users, and restricting unauthorized access to data. For example, only the DBA staff uses privileged commands and facilities providing persistent storage for program objects (e.g., object-oriented DBMSs make program objects persistent), providing storage structures (e.g., indexes) for efficient query processing. Additionally, a DBMS provides data independence, which helps the user to easily make changes to the database.
There are two types of data independence. The first, physical data independence, separates the conceptual level from the physical level (e.g., using a new storage device such as a hard drive); the second, logical data independence, separates any changes in the data from the data format. DBMS provides integrity rules by adding a primary key to guarantee that every record is unique. DBMS allows you to manage structured, semistructured, and unstructured data.
Data that have been organized into a formatted database and have relational keys (e.g., relational data) are called structured data. Data that are not organized in a formatted database but have some organized properties (e.g., XML data) are called semistructured data. Data that are not organized in a formatted database and do not have organized properties (e.g., PDF) are called unstructured data. The DBMS provides a backup copy of the entire database, which is backed up as decided, perhaps once a day. The backup copy should be stored in a secured location and is used to restore the database in the event of failure, loss, or damage to the original data.
Technology in Everyday Life
Databases Impact All Industries
Databases, database technology, and “big data” (complex datasets) have a tremendous impact in industries such as banking, insurance, retail, health care, real estate, e-commerce, law, education, and more recently, social networks. Data collection is also impacting advancements in the fields of medicine, environmental studies, science, and mobile technology.
Can you think of any other industries in which the use of big data is surprising? Does big data benefit the users or the collectors? Will big data enable more scientific research and/or discoveries? Why or why not?
DBMS Facets
DBMS includes various components as in Figure 8.6. DBMS interface is the main line of communication between the database and the user. There are many types of interfaces such as web-based, stand-alone query language, command line, forms-based, graphical user interface (GUI), natural language, admin, and network.
A connection manager manages reports, books, objects, and batches. For each database, the connection manager provides the connection name, data source type, and value. A security manager is a collection of processes used to secure the database from threats. The data description language (DDL) compiler translates statements in a high-level language into low-level instructions that the query evaluation engine understands. A query processor acts as an intermediary between users and the DBMS data engine to communicate query requests including DML compiler, query parser, query rewriter, query optimizer, and query executor. A storage manager is a program that is responsible for editing, storing, updating, deleting, and retrieving data in the database such as transaction manager, buffer manager, lock manager, and recovery manager. DBMS utilities are a set of utilities for managing and controlling database activities such as loading utility, reorganization utility, performance-monitoring utilities, user management utilities, and backup and recovery utility. Figure 8.7 illustrates the navigation window, results window, log window, and query window.
Data Model Categories
Data models can be conceptual (represented using EER or UML notations), logical, physical, or self-describing. The various logical data models can be categorized as follows:
- A hierarchical DBMS is a data model in which the data are organized into a treelike model, DML is procedural and record-oriented, the query processor is logical, and internal data models are intertwined (e.g., IMS from IBM).
- A network DBMS is a data model in which the data are organized into a network model, DML is procedural and record-oriented, the query processor is logical, and internal data models are intertwined (e.g., CA-IDMS from Computer Associates).
- A relational DBMS is a data model in which the data are organized into a relational data model, use SQL as a declarative and set-oriented database, the query processor has a strict separation between the logical and internal data model. Relational DBMSs are the most popular in the industry (e.g., MySQL open-source database from Oracle, Oracle DBMS, DB2 from IBM, and Microsoft SQL).
- An object-oriented DBMS is a data model in which the data are organized into an OO data model, avoiding impedance mismatch when used with an OO host language. It is also called OODBMS or ODB (e.g., db4o open-source database from Versant, Caché from Intersystems, GemStone/S from GemTalk Systems, which are only successful in niche markets, due to their complexity).
- An XML DBMS is a data model in which the data are using the XML data model to store data. XML could be native XML DBMS (e.g., BaseX and eXist), which map the tree structure of an XML document to a physical storage structure, or XML-enabled DBMS (e.g., Oracle and IBM Db2) are existing DBMS that are extended with facilities to store XML data.
- A not-only SQL DBMS, or NoSQL DBMS, comes in a variety of big unstructured data classified as a document, a graph, key-value stores, and column-oriented databases. NoSQL DBMS focuses on scalability and the ability to cope with irregular or highly volatile data structures (e.g., Apache Hadoop, MongoDB, Neo4j). The irregular data structures appear when the data don’t follow a specific order or nature. The volatile data are usualy stored in cache memory and is easy to lose.
Single vs. Multiuser DBMSs
With a single-user DBMS, only one user at a time can use the database. A multiuser DBMS allows many users to use the database concurrently. A multiuser DBMS is illustrated in Figure 8.8.
DBMS Users
DBMS users may be divided into actors on the scene who use and control the database content; design, develop, and maintain database applications such as database administrators and database designers; and workers behind the scenes who design and develop the DBMS software and related tools as well as the computer systems operators, including system designers and developers.
DBMS Architectures
There are various types of database architectures as follows: centralized DBMS architecture, n-tier DBMS architecture, cloud DBMS architecture, federated DBMS, and in-memory DBMS. In centralized DBMS architecture, the data are maintained on a centralized server at a single location while client-server DBMS architecture has one or more client computers connected to a central server over a network. Active clients request services from passive servers. An n-tier DBMS architecture usually divides an application into three tiers/layers: the presentation tier, the logic tier, and the data tier (e.g., a client with GUI functionality, an application server with applications, a database server with a DBMS and a database, and a web server for web-based access). In cloud DBMS architecture, the DBMS and database are hosted by a third-party cloud provider (e.g., Apache Cassandra project and Google’s BigTable). A federated DBMS provides a uniform interface to multiple underlying data sources which hide the underlying storage details to facilitate data access. An in-memory DBMS stores all data in internal memory instead of slower external storage (e.g., disk; often used for real-time purposes such as HANA from SAP).
DBMS Usage Areas
DBMS applications include the following: online transaction processing, online analytical processing, big data and analytics, multimedia DBMS, and open-source DBMS. In online transaction processing (OLTP), the focus is on managing operational or transactional data. The database server must be able to process lots of simple transactions per unit of time. A DBMS must have good support for processing a high volume of short and simple queries (e.g., RDBMS). In online analytical processing (OLAP), the focus is on using operational data for tactical or strategical decision-making. A limited number of users formulate complex queries to analyze the data. Therefore, the DBMS should support efficient processing of complex queries, which often come in smaller volumes (e.g., data warehouses). For example, complex SQL queries involve using queries beyond just the SELECT and WHERE commands. OLAP uses cube as a data structure that represents multiple dimensions to help in data analysis. In big data and analytics, the focus is on more flexible or even schemaless database structure, storing unstructured information such as emails, text documents, X (formerly Twitter) posts, Facebook posts (e.g., NoSQL databases). A multimedia DBMS provides storage of multimedia data such as text, images, audio, video, and 3-D games; it should also provide content-based query facilities. An open-source DBMS is publicly available and can be extended by anyone (e.g., MySQL from Oracle).