Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

attribute
characteristic or feature that defines an item in a dataset
categorical data
data that is represented in different forms and do not indicate measurable quantities
cell
a block or rectangle on a table that is specified with a combination of a row number and a column number
comma-separated values (CSV)
format of a dataset in which each item takes up a single line and its values are separated by commas (“,”)
continuous data
data whose value is chosen from an infinite set of numbers
data
anything that we can analyze to compile some high-level insights
data analysis
the process of examining and interpreting raw data to uncover patterns, discover meaningful insights, and make informed decisions
data collection
the systematic process of gathering information on variables of interest
data preparation (data processing)
the second step within the data science cycle; converts the collected data into an optimal form for analysis
data reporting
the presentation of data in a way that will best convey the information learned from data analysis
data science
a field of study that investigates how to collect, manage, and analyze data in order to retrieve meaningful information from some seemingly arbitrary data
data science cycle
a process used when investigating data
data visualization
the graphical representation of data to point out the patterns and trends involving the use of visual elements such as charts, graphs, and maps
data warehousing
the process of storing and managing large volumes of data from various sources in a central location for easier access and analysis by businesses
DataFrame
a data type that
Pandas
uses to store a multi-column tabular data
dataset
a collection of related and organized information or data points grouped together for reference or analysis
discrete data
data that follows a specific precision
Excel
a spreadsheet program with a graphical user interface developed by Microsoft to help with the manipulation and analysis of data
Extensible Markup Language (XML)
format of a dataset with which uses tags
Google Colaboratory (Colab)
software for editing and running Jupyter Notebook files
Google Sheets
a spreadsheet program with a graphical user interface developed by Google to help with the manipulation and analysis of data
information
some high-level insights that are compiled from data
Internet of Things (IoT)
the network of multiple objects interacting with each other through the Internet
item
an element that makes up a dataset; also referred to as an entry and an instance
JavaScript Object Notation (JSON)
format of a dataset that follows the syntax of the JavaScript programming language
Jupyter Notebook
a web-based document that helps users run Python programs more interactively
nominal data
data whose values do not include any ordering notion
numeric data
data that are represented in numbers and indicate measurable quantities
ordinal data
data whose values include an ordering notion
Pandas
a Python library specialized for data manipulation and analysis
predictive analytics
statistical techniques, algorithms, and machine learning that analyze historical data and make predictions about future events, an approach often used in medicine to offer more accurate diagnosis and treatment
programming language
a formal language that consists of a set of instructions or commands used to communicate with a computer and instruct it to perform specific tasks
Python
a programming language that has extensive libraries and is commonly used for data analysis
qualitative data
non-numerical data that generally describe subjective attributes or characteristics and are analyzed using methods such as thematic analysis or content analysis
quantitative data
data that can be measured by specific quantities and amounts and are often analyzed using statistical methods
R
an open-source programming language that is specifically designed for statistical computing and graphics
recommendation system
a system that makes data-driven, personalized suggestions for users
sabermetrics
a statistical approach to sports team management
sports analytics
use of data and business analytics in sports
spreadsheet program
a software application consisting of electronic worksheets with rows and columns where data can be entered, manipulated, and calculated
structured data
dataset whose individual items have the same structure
unstructured data
dataset whose individual items have different structures
XML tag
any block of text that consists of a pair of angle brackets (< >) with some text inside
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.