Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

1.
Which option best explains the significance of informed consent during the data collection phase?
  1. To ensure individuals are aware of the potential risks and benefits of the research
  2. To bypass legal regulations and guidelines
  3. To collect data without the knowledge of individuals
  4. To speed up the data collection process
2.
Who is responsible for obtaining informed consent from individuals or companies before collecting their data?
  1. Data analysts
  2. Government agencies
  3. Medical professionals
  4. Data scientists
3.
Which of the following is an important ethical principle to consider during the data collection phase?
  1. Maximizing collected data
  2. Minimizing informed consent
  3. Ensuring secure storage and access protocols
  4. Avoiding compliance with regulations and guidelines
4.
What could be a serious consequence of a data security breach during the development process of a data science project?
  1. Unauthorized access to sensitive information
  2. Alteration or destruction of data
  3. Risk of legal issues
  4. All of the above
5.
How can organizations mitigate the risk of data breaches during the development process?
  1. Have strong data security policies and protocols in place
  2. Respond to data breaches in a timely manner
  3. Only use publicly available data
  4. Share data with third parties without permission
6.
Which of the following are valid sources of data for a data science project? Select all that apply.
  1. Public datasets
  2. Private databases from unaffiliated sources
  3. Open repositories such as Kaggle.com and OpenML
  4. All of the above
7.
What type of data can be shared in a data science project?
  1. Quantitative and qualitative data only
  2. Tabular, text-based, and image-based data only
  3. Audio-based data only
  4. Quantitative, qualitative, tabular, text-based, and audio-based data
8.
Who should be given access to the data in a data science project?
  1. Anyone who is directly involved in the project
  2. Shareholders and customers only
  3. Anyone who is directly involved in the project and has a legitimate need for the data
  4. Anyone with a valid email address
9.
Which department is responsible for handling regulatory compliance in an organization?
  1. Human resources
  2. Marketing
  3. Legal and compliance teams
  4. Manufacturing
10.
What is the main purpose of considering fairness and equity in machine learning models?
  1. To increase the accuracy of the model by avoiding bias
  2. To avoid legal trouble
  3. To improve the interpretability of the model
  4. To focus on one group of data used in training
11.
A new social media platform is looking for ways to improve its content-delivery algorithm so that it appeals to a broader audience. Currently, the platform is being used primarily by college-aged students. Which of these datasets would be the most appropriate data to collect and analyze to improve the algorithm and reduce potential bias in the content-delivery model?
  1. Dataset 1, consisting of all the past browsing data on the platform
  2. Dataset 2, the results of an extensive survey of people from various backgrounds and demographic segments who may or may not be familiar with the new social media platform
  3. Dataset 3, the results of an internal company questionnaire
12.
What type of data would be considered an outlier in machine learning models?
  1. Data points that are well-represented in the training set
  2. Data points that are rare or abnormal in comparison to the rest of the data
  3. Data points that are equally distributed among all classes
  4. Data points that are consistently labeled incorrectly
13.
Why is it important to monitor and regularly test algorithms for any potential bias?
  1. To ensure models are not overfitting
  2. To maintain the accuracy of the model
  3. To uphold fairness and equity in the decision-making process
  4. To avoid legal consequences
14.
The admissions team of a large university would like to conduct research on which factors contribute the most to student success, thereby improving their selection process for new students. The dataset that they plan to use has numerical features such as high school GPA and scores on standardized tests as well as categorical features such as name, address, email address, and whether they played sports or did other extracurricular activities. The analysis of the data will be done by a team that involves student workers. Which of these features should be anonymized before the student workers can get to work?
  1. Name, address, and email address
  2. High school GPA and scores on standardized tests
  3. Sports and extracurricular activities
  4. None of the data should be anonymized
15.
In what situation can data be anonymized even if it falls under intellectual property rights?
  1. When the individual has given consent for anonymization
  2. When the data is used for research purposes
  3. When there is a potential threat to national security
  4. When the data will be used for marketing purposes
16.
Data source attribution promotes trust and transparency by:
  1. Identifying potential errors and biases in the data
  2. Allowing others to replicate the results
  3. Protecting the privacy and dignity of individuals
  4. All of the above
17.
Applying universal design principles in data visualization and data source attribution helps ensure that people in which group are able to view and use the data?
  1. People with visual impairments
  2. People with cognitive impairments
  3. People with auditory impairments
  4. Any person who will use the data
18.
What is the term used for properly crediting and acknowledging the sources of data in data science and research?
  1. Data attribution
  2. Data manipulation
  3. Data accuracy
  4. Data extraction
19.
What is one ethical responsibility of data scientists and researchers when presenting data?
  1. Maximizing profits
  2. Manipulating data to fit a desired narrative
  3. Ensuring accessibility and inclusivity
  4. Following the latest trends in data visualization techniques
20.
There was a growing concern at the Environmental and Agriculture Department about the rising levels of CO₂ in the atmosphere. It was clear that human activities, especially the burning of fossil fuels, were the primary cause of this increase. The department assigned a team of data scientists to research the potential consequences of climate patterns in the United States. They studied the trends of CO₂ emissions over the years and noticed a significant uptick in recent times. The team simulated various scenarios using climate models and predicted that if the current trend continues, there will be disastrous consequences for the climate. The four graphs displayed in Figure 8.11 depict the identical dataset from four different researchers. Out of the four graphs, which one accurately represents the CO₂ data without any type of distortion or misrepresentation?
Four line graphs labeled A, B, C, and D. All have a Y axis labeled CO2 (Millions of metric tonnes) and X axis labeled Timeline. The Y axis ranges are (A) 0 to 100,000, (B) and (C) 0 to 7,000, and (D) 0 to 5,000. The x axis ranges are (A) and (C) 1800 to 2050, (B) 1980 to 2030, and (D) 1900 to 2000. Graphs A, C, and D are generally rising left to right with C and D showing the steepest incline. Graph B shows a rise at the beginning and then declines around 2050.
Figure 8.11 Source: Hannah Ritchie, Max Roser and Pablo Rosado (2020) - "CO₂ and Greenhouse Gas Emissions." Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/co2-and-greenhouse-gas-emissions' [Online Resource]
  1. Graph A
  2. Graph B
  3. Graph C
  4. Graph D
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.