Key Terms
- analytics process model
- provides a statistical analysis using a set of processes to solve system problems and find a new market opportunity
- archival backup
- storing the data in different servers/sites
- asynchronous call
- client sends a request without waiting for the response
- atomicity, consistency, isolation, durability (ACID)
- properties that impose a number of constraints to ensure that stored data are reliable and accurate
- attribute
- column header
- blockchain DBMS
- database that stores data as a block data structure and each block is connected to other blocks by providing cryptographic security and immutability
- business intelligence (BI)
- set of activities, techniques, and tools aimed at understanding patterns in past data to predict the future
- CAP theorem
- states that a distributed computer system cannot guarantee consistency, availability, and partition tolerance at the same time
- centralized DBMS architecture
- data are maintained on a centralized server at a single location
- change data capture (CDC)
- technology that detects any data update event and keeps track of versions
- cloud DBMS architecture
- DBMS and database are hosted by a third-party cloud provider
- cluster
- single computer that can dramatically improve the performance, availability, and scalability over a single, more powerful machine and at a lower cost by using commodity hardware
- cognitive computing
- technology tries to simulate human’s way in solving problems
- column-oriented database
- database that stores data in column families or tables and is built to manage petabytes of data across a massive, distributed system
- computer scientist
- person who has theoretical and practical knowledge of computer science.
- concurrency control
- coordination of transactions that execute simultaneously on the same data so that they do not cause inconsistencies because of mutual interference
- connection manager
- manages reports, books, objects, and batches
- data
- information and facts that are stored digitally by a computer
- data accuracy
- whether the data values stored for an object are the correct values and are often correlated with other DQ dimensions
- Data as a Service (DaaS)
- data management technique that uses the cloud to store, process, and manage data
- data completeness
- degree to which all data in a specific dataset are available with a minimum percentage of missing data
- data compliance
- process that ensures that data practices align with external legal requirements and industry standards
- data consistency
- keeping data consistent as it moves between various parts of the system
- data consolidation
- use of ETL to capture data from multiple sources and integrate it into a single store such as a data warehouse
- data control language (DCL)
- language used to control access to data stored in a database
- data description language (DDL)
- language used to create and modify the object structure in a database
- data description language (DDL) compiler
- translates statements in a high-level language into low-level instructions that the query evaluation engine understands
- data dictionary
- set of information describing the contents, format, and structure of a database
- data federation
- use of enterprise information integration (EII) to provide a unified view over data sources
- data governance
- set of clear roles, policies, and responsibilities that enables the enterprise to manage and safeguard data quality
- data integration
- providing a consistent view of all organization data
- data lake
- large data repository that stores raw data and can be set up without having to first define the data structure and schema
- data management
- study of managing data effectively
- data manipulation language (DML)
- language used to manipulate and edit data in a database
- data mart
- scaled-down version of a data warehouse aimed at meeting the information needs of a homogeneous small group of end users
- data model
- abstract model that contains a set of concepts to describe the structure of a database, the operations for manipulating these structures, and certain constraints that the database should obey
- data owner
- person with the authority to ultimately decide on the access to, and usage of, the data
- data propagation
- use of enterprise application integration (EAI) corresponding to the synchronous or asynchronous propagation of updates in a source system to a target system
- data quality (DQ)
- measure of how well the data represents its purpose or fitness for use
- data quality dimension
- includes accuracy, completeness, consistency, and accessibility
- data query language (DQL)
- language used to make various queries in a database
- data redundancy
- happens when the same piece of data is held in two separate places in the database
- data replication
- storing data in more than one site to improve the data availability and retrieval performance
- data scientist
- person who has theoretical and practical knowledge of managing data
- data security
- pertains to guaranteeing data integrity, guaranteeing data availability, authentication, access control, guaranteeing confidentiality, auditing, mitigating, and vulnerabilities
- data steward
- person who ensures that the enterprise's actual business data and the metadata are accurate, accessible, secure, and safe
- data swamp
- data stored without organization to make retrieval easy
- data virtualization
- technique that hides the physical location of data and uses data integration patterns to produce a unified data view
- data warehouse
- centralizes an enterprise’s data from its databases; it supports the flow of data from operational systems to analytics/decision systems by creating a single repository of data from various sources both internal and external
- database administrator (DBA)
- person responsible for the implementation and monitoring of a database and ensuring databases run efficiently
- database application
- program or piece of software designed to collect, store, access, retrieve, and manage information efficiently and securely
- database architecture
- representation of the design that helps design, develop, implement, and maintain the DBMS
- database designer
- person responsible for creating, implementing, and maintaining the database management system
- database language
- used to write instructions to access and update data in the database
- database management system (DBMS)
- approach where metadata are stored in a catalog
- database normalization
- process of structuring a relational database to reduce data redundancy and improve data integrity
- database recovery
- activity of setting the database in a consistent state without any data loss in the event of a failure or when any problem occurs
- database security
- using a set of controls to secure data, guaranteeing a high level of confidentiality
- database transaction
- sequence of read/write operations considered to be an atomic unit
- database user
- person with the privileges to access, analyze, update, and maintain the data
- DBMS interface
- main line of communication between the database and the user
- DBMS utility
- utility for managing and controlling database activities such as loading utility, reorganization utility, performance-monitoring utilities, user management utilities, backup and recovery utility
- deep learning network
- machine learning method based on artificial neural networks
- denormalizing
- process of merging several normalized data tables into an aggregated, denormalized data table
- descriptive analytics
- patterns of customer behavior
- disk storage
- memory device that stores the data such as hard disks, flash memory, magnetic disks, optical disks, and tapes
- distributed transaction
- set of operations that are performed across multiple database systems
- domain constraint
- defines the domain of values for an attribute
- enterprise search
- process of making content stemming from databases by offering tools that can be used within the enterprise
- entity integrity constraint
- specifies that no primary key contains a null
- equi-join
- join that combines tables based on matching values in specified columns
- exploratory analysis
- process of summarizing and visualizing data for initial insight
- extraction, transformation, and loading (ETL)
- data integration that combines data from multiple sources, fixes the data format, and loads the data into a data warehouse
- fact constellation
- more than one fact table connected to other smaller dimension tables
- fat client variant
- where presentation logic and application logic are handled by the client; common in cases where it makes sense to couple an application’s workflow
- federated DBMS
- provides a uniform interface to multiple underlying data sources
- flat file database
- database that uses a simple structure to store data in a text file; each line in the file holds one record
- full-text search
- selection of individual text documents from a collection of documents according to the presence of a single or a combination of search terms in the document
- functional dependency (FD)
- constraint that specifies the relationship between two sets of attributes and provides a formal tool for the analysis of relational schemas
- garbage in, garbage out (GIGO)
- quality of output is determined by the quality of the input
- graph-based database
- database that represents data as a network of related nodes or objects to facilitate data visualizations and graph analytics
- Hadoop
- distributed data infrastructures that leverage clusters to store and process massive amounts of data
- heuristics optimization
- mathematical technique for processing a query quickly
- hierarchical DBMS
- data model in which the data are organized into a treelike model, DML is procedural and record-oriented, the query processor is logical, and internal data models are intertwined
- hierarchical model
- model in which data are stored in the form of records and organized into a tree structure
- horizontal fragmentation (sharding)
- rows that satisfy a query predicate, global view with UNION query, and common in NoSQL databases
- immediate backup
- storing the copies in disks
- in-memory DBMS
- stores all data in internal memory instead of slower external storage
- indexed organization
- uses a key, similar to relative organization, but the key is unique and fixed
- informatics
- study, design, and development of information technology for the good of people, organizations, and society
- information architect
- (also, data architect or information analyst) a person responsible for designing the conceptual data model (blueprints) to bridge the gap between the business processes and the IT environment
- information retrieval
- searching for information in documents using retrieval models that specify matching functions and query representation
- inner join
- represents the intersection of two tables
- key constraint
- specifies that all the values of the primary key must be unique
- key-value store
- simple database that uses an associative array such as Redis, DynamoDB, and Cosmos DB
- logical data independence
- separates any changes in the data from the data format
- logical design
- designing a database based on a specific data model but independent of physical details
- macro life cycle
- includes feasibility analysis, requirements collection and analysis, design, implementation, and validation and acceptance testing
- MapReduce
- open-source software framework used to apply complex queries
- master data management (MDM)
- series of processes, policies, standards, and tools to help organizations define and provide a single point of reference for all data that are mastered
- merging process
- selection of information from different tables about a specific entity and copying it to an aggregated table
- metadata modeling
- business presentation of metadata
- micro life cycle
- focuses on system definition, database design, database implementation, loading or data conversion, application conversion, testing and validation, operation, monitoring, and maintenance
- miniworld
- (also, universe of discourse [UoD]) represents some aspect of the real-world data that is stored in a database
- missing value
- filling an empty field or deleting the field
- mixed fragmentation
- combines horizontal and vertical fragmentation
- multifile relational database
- database that is more flexible than flat file structures by providing more functionality for creating and updating data
- multimedia DBMS
- provides storage of multimedia data such as text, images, audio, and video
- multiuser DBMS
- allows many users to use the database concurrently
- multivalued dependency (MVD)
- occurs when two attributes in a table are independent of each other but both depend on a third attribute
- n-tier DBMS
- multitier architecture that usually divides an application into three tiers
- natural join
- creates an implicit join based on the common columns in two tables
- network DBMS
- data are organized into a network model, DML is procedural and record-oriented, the query processor is logical, and internal data models are intertwined
- non-first normal form (NFNF)
- database data model that does not meet any of the conditions of database normalization defined by the relational model
- nonrelational database
- database that does not use a traditional method for storing data such as rows and columns
- NoSQL DBMS
- big unstructured data classified as document, graph, key-value stores, and column-oriented databases
- object persistence
- refers to when an object is not deleted until a need emerges to remove it from memory
- object-oriented DBMS
- data model in which the data are organized into an OO data model as no impedance mismatch in combination with the OO host language
- online analytical processing (OLAP)
- focuses on using operational data for tactical or strategical decision-making
- online transaction processing (OLTP)
- focuses on managing operational or transactional data; the database server must be able to process lots of simple transactions per unit of time
- ontology
- semantic data used to describe entities of the real world and the relationship between the entities using the Web Ontology Language (OWL)
- open-source DBMS
- publicly available DBMS that can be extended by anyone
- operational data store (ODS)
- staging area that provides query facilities
- optimizer
- process of selecting the best plan to execute
- outer join
- union of two tables
- outlier
- value that is outside the population that should be detected in order to apply the handling process on it
- parallel processing
- technique in which multiple processors work simultaneously on different tasks or different parts of a task to enable concurrent processing of large amounts of data
- persistence independence
- when an object is independent from how a program manipulates it
- persistence orthogonality
- concept means that the environment does not require any actions by a program to retrieve or save their state
- physical data independence
- separates the conceptual level from the physical level
- physical database design
- attributes logical concepts to physical constructs
- predictive analytics
- predicts the target measure of interest using regression and classification
- primary key
- special unique identifier for each table record
- query processor
- acts as an intermediary between users and the DBMS data engine to communicate query requests including DML compiler, query parser, query rewriter, query optimizer, and query executor
- query tree
- example of data structure representation for the relational algebra expression
- query-by-example (QBE)
- database query language for relational databases based on domain relational calculus
- redundant array of inexpensive disks (RAID)
- stores information across an array of low-cost hard disks
- reinforcement
- machine learning method based on encouraging desired behaviors and removing undesired behaviors
- relation
- mathematical concept based on the ideas of sets
- relational algebra
- query language that uses unary or binary operators to perform queries
- relational database design (RDD)
- models data into a set of tables with rows and columns
- relational DBMS
- data model in which the data are organized into a relational data model, use SQL as a declarative and set-oriented database, the query processor has a strict separation between the logical and internal data model
- relative organization
- when each record is assigned a numeric key to rearrange the order of the records at any time
- return on investment (ROI)
- ratio of net profits divided by the investment of resources
- sampling
- selecting a subset of historical data to build an analytical model
- security manager
- collection of processes used to secure the database from threats
- semistructured data
- data that are not organized in a formatted database but have some organized properties
- sequential file organization
- records are organized in the order stored and any new record is added at the end
- single-user DBMS
- only one user at a time can use the database
- snowflake schema
- data model that normalizes the dimension table
- spanned record
- when all records are classified into blocks and the length of the record can exceed the size of a block
- star schema
- data model with one large fact table connected to smaller tables
- storage manager
- program that is responsible for editing, storing, updating, deleting, and retrieving data in the database such as transaction manager, buffer manager, lock manager, and recovery manager
- structured data
- data that have been organized into a formatted database and have relational keys
- Structured Query Language (SQL)
- programming language used in programming and managing structured data located in an RDBMS
- synchronous call
- when the client sends a request and waits for a response from the service
- tablespace
- where tables are stored physically in the memory
- theta join
- allows merging two tables based on a theta condition
- thin client variant
- where only the presentation logic is handled by the client and applications and database commands are executed on the server; it is common when application logic and database logic are tightly coupled or similar
- total cost of ownership (TCO)
- cost of owning and operating the analytical model over time
- transaction
- set of database operations induced by a single user or application that should be considered as one undividable unit of work
- transaction management
- delineating transactions within the transaction life cycle
- transfer learning
- machine learning method based on reusing the result of a specific task to start a new task
- translation
- process of translating from high-level language to machine language
- trigger
- statement consisting of declarative and/or procedural instructions and stored in the catalog of the RDBMS
- tuple
- one row with a collection of values separated by a comma and enclosed in parenthesis
- tuple and document store
- database that stores data in XML or JSON format with the document name as key and the contents of the document as value
- uniqueness constraint
- specifies that all the tuples must be unique
- unstructured data
- data that are not organized in a formatted database and do not have organized properties
- value
- actual value derived using the total cost of ownership (TCO) and return on investment (ROI) of the data
- variety
- range of data types and sources that are used; data in its many forms
- velocity
- speed at which data comes in and goes out; data in motion
- veracity
- uncertainty of the data; data in doubt
- vertical fragmentation
- subset of columns of data, global view with JOIN query, and useful if only some of a tuple’s attributes are relevant to a node
- virtual data mart
- usually defined as a single SQL view
- virtual data warehouse
- can be built as a set of SQL views directly on the underlying operational data sources as an extra layer on top of a collection of physical independent data marts
- volume
- amount of data; data at rest
- weak entity
- type of entity that cannot be uniquely identified based on its attributes alone and must rely on a strong entity to provide the context necessary for identification
- XML DBMS
- data model in which the data are using the XML data model to store data