- anonymization
- act of removing personal identifying information from datasets and other forms of data to make sensitive information usable for analysis without the risk of exposing personal information
- anonymous data
- data that has been stripped of personally identifiable information (or never contained such information in the first place)
- autonomy
- in data science, the ideal that individuals maintain control over the decisions regarding the collection and use of their data
- confidentiality
- safeguarding of privacy and security of data by controlling access to it
- cookies
- small data files from websites that are deposited on users’ hard disk to keep track of browsing and search history and to collect information about potential interests to tailor advertisements and product placement on websites
- copyright
- protection under the law for original creative work
- cross-validation
- comparison of the results of a model with different subsets of the data or with the entire dataset by repeatedly breaking the data into training and testing sets and evaluating the model's performance on different subsets of the data
- data breach
- the act of data being stolen by a malicious third party
- data governance protocols
- set of rules, policies, and procedures that enable precise control over data access while ensuring that it is safeguarded
- data privacy
- the assurance that individual data is collected, processed, and stored securely with respect for individuals' rights and preferences
- data retention
- how long personal data may be stored
- data security
- steps taken to keep data secure from unauthorized access or manipulation
- data sharing
- processes of allowing access to or transferring data from one entity (individual, organization, or system) to another
- data source attribution
- the practice of clearly identifying and acknowledging the sources employed in the visualizations and reporting of data
- data sovereignty
- laws that require data collected from a country’s citizens to be stored and processed within its borders
- digital divide
- gap between those who have access to digital technologies, such as the internet and computers, and those who do not
- encryption
- the process of converting sensitive or confidential data into a code in order to protect it from unauthorized access or interception
- ethics in data science
- responsible collection, analysis, use, and dissemination of data
- explainable AI (XAI)
- set of processes, methodologies, and techniques designed to make artificial intelligence (AI) models, particularly complex ones like deep learning models, more understandable and interpretable to humans
- fairness
- absence of bias in the models and algorithms used to process data
- Family Educational Rights and Privacy Act (FERPA)
- legislation providing protections for student educational records and defining certain rights for parents regarding their children’s records
- hashing
- process of transforming data into a fixed-length value or string (called a hash), typically using an algorithm called a hash function
- Health Insurance Portability and Accountability Act (HIPAA)
- U.S. legislation requiring the safeguarding of sensitive information related to patient health
- informed consent
- the process of obtaining permission from a research subject indicating that they understand the scope of collecting data
- intellectual property
- original artistic works, trademarks and trade secrets, patents, and other creative output
- k-anonymization
- principle of ensuring that each record within a dataset is indistinguishable from at least k – 1 other records with respect to a specified set of identifying attributes or features
- outlier detection
- identification of observations that are significantly different from the rest of the data
- personally identifiable information (PII)
- information that directly and unambiguously identifies an individual
- pseudonymization
- act of replacing sensitive information in a dataset with artificial identifiers or codes while still maintaining its usefulness for analysis
- regulatory compliance officer (RCO)
- a trained individual responsible for confirming that a company or organization follows the laws, regulations, and policies that rule its functions to avoid legal and financial risks
- transparency
- being open and honest about how data is collected, stored, and used
- universal design principles
- set of guidelines aimed at creating products, environments, and systems that are accessible and usable by all people regardless of age, ability, or disability