Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

actionable advice
recommendations or guidance in a report, particularly in an executive summary, providing practical ways to apply the report's findings or insights
alt text
brief descriptions of images that accompany the image or may be embedded within the image data, meant to aid readers who are not able to view the images and graphics due to disability or other limitations
assumption
statement that is thought to be true without being verified or proven; foundational hypotheses or beliefs about the structure, relationships, or distribution of data that guide the analytical approach and model selection for a project
audience
the person or group that will be reading the report
bootstrap samples
multiple samples taken from a dataset with duplicates permitted
codebook
data dictionary to detail the variables, their units, values, and labels within the dataset used in the project
constraint
limitation or restriction that is imposed on a project or its solution
cross-validation
validation method that divides the training set into multiple subsets and iteratively trains and evaluates the model on different combinations of these subsets
executive (or data) dashboard
a visual representation tool designed to provide a quick, real-time overview of an organization's key performance indicators (KPIs) and metrics to senior management, often updated in real time or at regular intervals
executive summary
concise, standalone document that encapsulates the essence of a data science report
executives
decision-makers such as managers, CEOs, or others who may not have detailed technical knowledge but need an understanding of the implications of the information for strategic decision-making
experts
individuals with an advanced understanding of the subject matter, often with specialized knowledge or education in the topic at hand
fold
one of a set of equally sized subsets used in k-fold cross-validation
hyperparameter tuning
fine-tuning of certain constants that affect the performance of a model
jargon
specialized and/or technical terms in a given field
k-fold cross-validation
cross-validation strategy that works by dividing the dataset into kk equally sized subsets or folds, where the model is trained on k1k1 of the folds and tested on the remaining fold, a process that is repeated kk times with each fold serving as the test set once
key performance indicators (KPIs)
quantifiable metrics used to evaluate the success of an organization, employee, or process in achieving specific objectives and goals
layered approach
writing strategy in which a report is organized into sections or appendices that provide different levels of detail and complexity for different audiences
leave-one-out cross-validation (LOOCV)
special case of k-fold cross-validation where kk is set to the number of data points in the dataset so that the model is trained on all data points except one, which is used as the validation set, and this process is repeated for each data point in the dataset
mean percentage error (MPE)
average of the percentage errors between predicted (y^i)(y^i) and actual (yi)(yi) value: MPE=1ni=1nyiy^iyiMPE=1ni=1nyiy^iyi
Monte Carlo simulation
random sampling and statistical modeling to estimate the probability of different outcomes under uncertainty
multi-way sensitivity analysis
explores the effects of simultaneous changes in multiple input parameters on the outcome
neurodiversity
normal variation of individual cognitive abilities, especially in realms of communication and processing in formation
nonspecialists
nonexpert audience without specialized knowledge of the subject but with some need to understand the basics of a data science report for a particular reason
one-way sensitivity analysis
examines how changes in one input parameter at a time affect the outcome of a model
scenario analysis
evaluates the outcomes under different predefined sets of input parameters, representing possible future states or scenarios
sensitivity analysis
technique used to determine how different values of an independent variable affect a particular dependent variable under a given set of assumptions
technicians
practical users of information in a data science report who may apply the knowledge in a hands-on manner
validation
evaluation of multiple predictive models and/or hyperparameter tuning
version control system
a software tool that helps keep track of changes to files, code, or any type of digital content over time
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.