Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

anonymization
act of removing personal identifying information from datasets and other forms of data to make sensitive information usable for analysis without the risk of exposing personal information
anonymous data
data that has been stripped of personally identifiable information (or never contained such information in the first place)
autonomy
in data science, the ideal that individuals maintain control over the decisions regarding the collection and use of their data
confidentiality
safeguarding of privacy and security of data by controlling access to it
cookies
small data files from websites that are deposited on users’ hard disk to keep track of browsing and search history and to collect information about potential interests to tailor advertisements and product placement on websites
copyright
protection under the law for original creative work
cross-validation
comparison of the results of a model with different subsets of the data or with the entire dataset by repeatedly breaking the data into training and testing sets and evaluating the model's performance on different subsets of the data
data breach
the act of data being stolen by a malicious third party
data governance protocols
set of rules, policies, and procedures that enable precise control over data access while ensuring that it is safeguarded
data privacy
the assurance that individual data is collected, processed, and stored securely with respect for individuals' rights and preferences
data retention
how long personal data may be stored
data security
steps taken to keep data secure from unauthorized access or manipulation
data sharing
processes of allowing access to or transferring data from one entity (individual, organization, or system) to another
data source attribution
the practice of clearly identifying and acknowledging the sources employed in the visualizations and reporting of data
data sovereignty
laws that require data collected from a country’s citizens to be stored and processed within its borders
digital divide
gap between those who have access to digital technologies, such as the internet and computers, and those who do not
encryption
the process of converting sensitive or confidential data into a code in order to protect it from unauthorized access or interception
ethics in data science
responsible collection, analysis, use, and dissemination of data
explainable AI (XAI)
set of processes, methodologies, and techniques designed to make artificial intelligence (AI) models, particularly complex ones like deep learning models, more understandable and interpretable to humans
fairness
absence of bias in the models and algorithms used to process data
Family Educational Rights and Privacy Act (FERPA)
legislation providing protections for student educational records and defining certain rights for parents regarding their children’s records
hashing
process of transforming data into a fixed-length value or string (called a hash), typically using an algorithm called a hash function
Health Insurance Portability and Accountability Act (HIPAA)
U.S. legislation requiring the safeguarding of sensitive information related to patient health
informed consent
the process of obtaining permission from a research subject indicating that they understand the scope of collecting data
intellectual property
original artistic works, trademarks and trade secrets, patents, and other creative output
k-anonymization
principle of ensuring that each record within a dataset is indistinguishable from at least k – 1 other records with respect to a specified set of identifying attributes or features
outlier detection
identification of observations that are significantly different from the rest of the data
personally identifiable information (PII)
information that directly and unambiguously identifies an individual
pseudonymization
act of replacing sensitive information in a dataset with artificial identifiers or codes while still maintaining its usefulness for analysis
regulatory compliance officer (RCO)
a trained individual responsible for confirming that a company or organization follows the laws, regulations, and policies that rule its functions to avoid legal and financial risks
transparency
being open and honest about how data is collected, stored, and used
universal design principles
set of guidelines aimed at creating products, environments, and systems that are accessible and usable by all people regardless of age, ability, or disability
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.