Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

Key Terms

analytics process model
provides a statistical analysis using a set of processes to solve system problems and find a new market opportunity
archival backup
storing the data in different servers/sites
asynchronous call
client sends a request without waiting for the response
atomicity, consistency, isolation, durability (ACID)
properties that impose a number of constraints to ensure that stored data are reliable and accurate
attribute
column header
blockchain DBMS
database that stores data as a block data structure and each block is connected to other blocks by providing cryptographic security and immutability
business intelligence (BI)
set of activities, techniques, and tools aimed at understanding patterns in past data to predict the future
CAP theorem
states that a distributed computer system cannot guarantee consistency, availability, and partition tolerance at the same time
centralized DBMS architecture
data are maintained on a centralized server at a single location
change data capture (CDC)
technology that detects any data update event and keeps track of versions
cloud DBMS architecture
DBMS and database are hosted by a third-party cloud provider
cluster
single computer that can dramatically improve the performance, availability, and scalability over a single, more powerful machine and at a lower cost by using commodity hardware
cognitive computing
technology tries to simulate human’s way in solving problems
column-oriented database
database that stores data in column families or tables and is built to manage petabytes of data across a massive, distributed system
computer scientist
person who has theoretical and practical knowledge of computer science.
concurrency control
coordination of transactions that execute simultaneously on the same data so that they do not cause inconsistencies because of mutual interference
connection manager
manages reports, books, objects, and batches
data
information and facts that are stored digitally by a computer
data accuracy
whether the data values stored for an object are the correct values and are often correlated with other DQ dimensions
Data as a Service (DaaS)
data management technique that uses the cloud to store, process, and manage data
data completeness
degree to which all data in a specific dataset are available with a minimum percentage of missing data
data compliance
process that ensures that data practices align with external legal requirements and industry standards
data consistency
keeping data consistent as it moves between various parts of the system
data consolidation
use of ETL to capture data from multiple sources and integrate it into a single store such as a data warehouse
data control language (DCL)
language used to control access to data stored in a database
data description language (DDL)
language used to create and modify the object structure in a database
data description language (DDL) compiler
translates statements in a high-level language into low-level instructions that the query evaluation engine understands
data dictionary
set of information describing the contents, format, and structure of a database
data federation
use of enterprise information integration (EII) to provide a unified view over data sources
data governance
set of clear roles, policies, and responsibilities that enables the enterprise to manage and safeguard data quality
data integration
providing a consistent view of all organization data
data lake
large data repository that stores raw data and can be set up without having to first define the data structure and schema
data management
study of managing data effectively
data manipulation language (DML)
language used to manipulate and edit data in a database
data mart
scaled-down version of a data warehouse aimed at meeting the information needs of a homogeneous small group of end users
data model
abstract model that contains a set of concepts to describe the structure of a database, the operations for manipulating these structures, and certain constraints that the database should obey
data owner
person with the authority to ultimately decide on the access to, and usage of, the data
data propagation
use of enterprise application integration (EAI) corresponding to the synchronous or asynchronous propagation of updates in a source system to a target system
data quality (DQ)
measure of how well the data represents its purpose or fitness for use
data quality dimension
includes accuracy, completeness, consistency, and accessibility
data query language (DQL)
language used to make various queries in a database
data redundancy
happens when the same piece of data is held in two separate places in the database
data replication
storing data in more than one site to improve the data availability and retrieval performance
data scientist
person who has theoretical and practical knowledge of managing data
data security
pertains to guaranteeing data integrity, guaranteeing data availability, authentication, access control, guaranteeing confidentiality, auditing, mitigating, and vulnerabilities
data steward
person who ensures that the enterprise's actual business data and the metadata are accurate, accessible, secure, and safe
data swamp
data stored without organization to make retrieval easy
data virtualization
technique that hides the physical location of data and uses data integration patterns to produce a unified data view
data warehouse
centralizes an enterprise’s data from its databases; it supports the flow of data from operational systems to analytics/decision systems by creating a single repository of data from various sources both internal and external
database administrator (DBA)
person responsible for the implementation and monitoring of a database and ensuring databases run efficiently
database application
program or piece of software designed to collect, store, access, retrieve, and manage information efficiently and securely
database architecture
representation of the design that helps design, develop, implement, and maintain the DBMS
database designer
person responsible for creating, implementing, and maintaining the database management system
database language
used to write instructions to access and update data in the database
database management system (DBMS)
approach where metadata are stored in a catalog
database normalization
process of structuring a relational database to reduce data redundancy and improve data integrity
database recovery
activity of setting the database in a consistent state without any data loss in the event of a failure or when any problem occurs
database security
using a set of controls to secure data, guaranteeing a high level of confidentiality
database transaction
sequence of read/write operations considered to be an atomic unit
database user
person with the privileges to access, analyze, update, and maintain the data
DBMS interface
main line of communication between the database and the user
DBMS utility
utility for managing and controlling database activities such as loading utility, reorganization utility, performance-monitoring utilities, user management utilities, backup and recovery utility
deep learning network
machine learning method based on artificial neural networks
denormalizing
process of merging several normalized data tables into an aggregated, denormalized data table
descriptive analytics
patterns of customer behavior
disk storage
memory device that stores the data such as hard disks, flash memory, magnetic disks, optical disks, and tapes
distributed transaction
set of operations that are performed across multiple database systems
domain constraint
defines the domain of values for an attribute
enterprise search
process of making content stemming from databases by offering tools that can be used within the enterprise
entity integrity constraint
specifies that no primary key contains a null
equi-join
join that combines tables based on matching values in specified columns
exploratory analysis
process of summarizing and visualizing data for initial insight
extraction, transformation, and loading (ETL)
data integration that combines data from multiple sources, fixes the data format, and loads the data into a data warehouse
fact constellation
more than one fact table connected to other smaller dimension tables
fat client variant
where presentation logic and application logic are handled by the client; common in cases where it makes sense to couple an application’s workflow
federated DBMS
provides a uniform interface to multiple underlying data sources
flat file database
database that uses a simple structure to store data in a text file; each line in the file holds one record
full-text search
selection of individual text documents from a collection of documents according to the presence of a single or a combination of search terms in the document
functional dependency (FD)
constraint that specifies the relationship between two sets of attributes and provides a formal tool for the analysis of relational schemas
garbage in, garbage out (GIGO)
quality of output is determined by the quality of the input
graph-based database
database that represents data as a network of related nodes or objects to facilitate data visualizations and graph analytics
Hadoop
distributed data infrastructures that leverage clusters to store and process massive amounts of data
heuristics optimization
mathematical technique for processing a query quickly
hierarchical DBMS
data model in which the data are organized into a treelike model, DML is procedural and record-oriented, the query processor is logical, and internal data models are intertwined
hierarchical model
model in which data are stored in the form of records and organized into a tree structure
horizontal fragmentation (sharding)
rows that satisfy a query predicate, global view with UNION query, and common in NoSQL databases
immediate backup
storing the copies in disks
in-memory DBMS
stores all data in internal memory instead of slower external storage
indexed organization
uses a key, similar to relative organization, but the key is unique and fixed
informatics
study, design, and development of information technology for the good of people, organizations, and society
information architect
(also, data architect or information analyst) a person responsible for designing the conceptual data model (blueprints) to bridge the gap between the business processes and the IT environment
information retrieval
searching for information in documents using retrieval models that specify matching functions and query representation
inner join
represents the intersection of two tables
key constraint
specifies that all the values of the primary key must be unique
key-value store
simple database that uses an associative array such as Redis, DynamoDB, and Cosmos DB
logical data independence
separates any changes in the data from the data format
logical design
designing a database based on a specific data model but independent of physical details
macro life cycle
includes feasibility analysis, requirements collection and analysis, design, implementation, and validation and acceptance testing
MapReduce
open-source software framework used to apply complex queries
master data management (MDM)
series of processes, policies, standards, and tools to help organizations define and provide a single point of reference for all data that are mastered
merging process
selection of information from different tables about a specific entity and copying it to an aggregated table
metadata modeling
business presentation of metadata
micro life cycle
focuses on system definition, database design, database implementation, loading or data conversion, application conversion, testing and validation, operation, monitoring, and maintenance
miniworld
(also, universe of discourse [UoD]) represents some aspect of the real-world data that is stored in a database
missing value
filling an empty field or deleting the field
mixed fragmentation
combines horizontal and vertical fragmentation
multifile relational database
database that is more flexible than flat file structures by providing more functionality for creating and updating data
multimedia DBMS
provides storage of multimedia data such as text, images, audio, and video
multiuser DBMS
allows many users to use the database concurrently
multivalued dependency (MVD)
occurs when two attributes in a table are independent of each other but both depend on a third attribute
n-tier DBMS
multitier architecture that usually divides an application into three tiers
natural join
creates an implicit join based on the common columns in two tables
network DBMS
data are organized into a network model, DML is procedural and record-oriented, the query processor is logical, and internal data models are intertwined
non-first normal form (NFNF)
database data model that does not meet any of the conditions of database normalization defined by the relational model
nonrelational database
database that does not use a traditional method for storing data such as rows and columns
NoSQL DBMS
big unstructured data classified as document, graph, key-value stores, and column-oriented databases
object persistence
refers to when an object is not deleted until a need emerges to remove it from memory
object-oriented DBMS
data model in which the data are organized into an OO data model as no impedance mismatch in combination with the OO host language
online analytical processing (OLAP)
focuses on using operational data for tactical or strategical decision-making
online transaction processing (OLTP)
focuses on managing operational or transactional data; the database server must be able to process lots of simple transactions per unit of time
ontology
semantic data used to describe entities of the real world and the relationship between the entities using the Web Ontology Language (OWL)
open-source DBMS
publicly available DBMS that can be extended by anyone
operational data store (ODS)
staging area that provides query facilities
optimizer
process of selecting the best plan to execute
outer join
union of two tables
outlier
value that is outside the population that should be detected in order to apply the handling process on it
parallel processing
technique in which multiple processors work simultaneously on different tasks or different parts of a task to enable concurrent processing of large amounts of data
persistence independence
when an object is independent from how a program manipulates it
persistence orthogonality
concept means that the environment does not require any actions by a program to retrieve or save their state
physical data independence
separates the conceptual level from the physical level
physical database design
attributes logical concepts to physical constructs
predictive analytics
predicts the target measure of interest using regression and classification
primary key
special unique identifier for each table record
query processor
acts as an intermediary between users and the DBMS data engine to communicate query requests including DML compiler, query parser, query rewriter, query optimizer, and query executor
query tree
example of data structure representation for the relational algebra expression
query-by-example (QBE)
database query language for relational databases based on domain relational calculus
redundant array of inexpensive disks (RAID)
stores information across an array of low-cost hard disks
reinforcement
machine learning method based on encouraging desired behaviors and removing undesired behaviors
relation
mathematical concept based on the ideas of sets
relational algebra
query language that uses unary or binary operators to perform queries
relational database design (RDD)
models data into a set of tables with rows and columns
relational DBMS
data model in which the data are organized into a relational data model, use SQL as a declarative and set-oriented database, the query processor has a strict separation between the logical and internal data model
relative organization
when each record is assigned a numeric key to rearrange the order of the records at any time
return on investment (ROI)
ratio of net profits divided by the investment of resources
sampling
selecting a subset of historical data to build an analytical model
security manager
collection of processes used to secure the database from threats
semistructured data
data that are not organized in a formatted database but have some organized properties
sequential file organization
records are organized in the order stored and any new record is added at the end
single-user DBMS
only one user at a time can use the database
snowflake schema
data model that normalizes the dimension table
spanned record
when all records are classified into blocks and the length of the record can exceed the size of a block
star schema
data model with one large fact table connected to smaller tables
storage manager
program that is responsible for editing, storing, updating, deleting, and retrieving data in the database such as transaction manager, buffer manager, lock manager, and recovery manager
structured data
data that have been organized into a formatted database and have relational keys
Structured Query Language (SQL)
programming language used in programming and managing structured data located in an RDBMS
synchronous call
when the client sends a request and waits for a response from the service
tablespace
where tables are stored physically in the memory
theta join
allows merging two tables based on a theta condition
thin client variant
where only the presentation logic is handled by the client and applications and database commands are executed on the server; it is common when application logic and database logic are tightly coupled or similar
total cost of ownership (TCO)
cost of owning and operating the analytical model over time
transaction
set of database operations induced by a single user or application that should be considered as one undividable unit of work
transaction management
delineating transactions within the transaction life cycle
transfer learning
machine learning method based on reusing the result of a specific task to start a new task
translation
process of translating from high-level language to machine language
trigger
statement consisting of declarative and/or procedural instructions and stored in the catalog of the RDBMS
tuple
one row with a collection of values separated by a comma and enclosed in parenthesis
tuple and document store
database that stores data in XML or JSON format with the document name as key and the contents of the document as value
uniqueness constraint
specifies that all the tuples must be unique
unstructured data
data that are not organized in a formatted database and do not have organized properties
value
actual value derived using the total cost of ownership (TCO) and return on investment (ROI) of the data
variety
range of data types and sources that are used; data in its many forms
velocity
speed at which data comes in and goes out; data in motion
veracity
uncertainty of the data; data in doubt
vertical fragmentation
subset of columns of data, global view with JOIN query, and useful if only some of a tuple’s attributes are relevant to a node
virtual data mart
usually defined as a single SQL view
virtual data warehouse
can be built as a set of SQL views directly on the underlying operational data sources as an extra layer on top of a collection of physical independent data marts
volume
amount of data; data at rest
weak entity
type of entity that cannot be uniquely identified based on its attributes alone and must rely on a strong entity to provide the context necessary for identification
XML DBMS
data model in which the data are using the XML data model to store data
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introduction-computer-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introduction-computer-science/pages/1-introduction
Citation information

© Oct 29, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.