1
.
Megan is interested in opening a gaming center that utilizes virtual reality (VR) technology. She wants to gauge the level of interest in VR gaming among the residents of her city. She plans to conduct a survey using a random sampling method and then use this data to determine the potential demand for a VR gaming center in her city. What is the nature of the collected data, experimental or observational?
2
.
A high school student is interested in determining the relationship between the weight of an object and the distance it can travel when launched from a catapult. To collect data for analysis, the student plans to launch objects of varying weight and then measure the corresponding distance traveled. Using this data, the student plans to plot a graph with weight on the x-axis and distance on the y-axis.
a.
Will this data collection result in observational data or experimental data?
b.
How can the student gather the necessary data to establish this relationship?
3
.
A high school is conducting a survey to understand the opinions of the school’s students toward the food options available in the cafeteria. They have divided the student population into four groups based on grade level (9th, 10th, 11th, 12th). A random sample of 60 students is selected from each group and asked to rate the cafeteria food on a scale of 1–10. The results are shown in the following table:
What sampling technique was used to collect this data?
Grade Level | Number of Students | Average Rating |
9th | 60 | 8.1 |
10th | 60 | 7.4 |
11th | 60 | 6.3 |
12th | 60 | 6.1 |
4
.
A research study is being conducted to understand the opinions of people on the new city budget. The population consists of residents from different neighborhoods in the city with varying income levels. The following table provides the data of 130 participants randomly selected for the study, categorized by their neighborhood and income level.
What sampling technique was used to collect this data?
Neighborhood | Income Level | Number of Participants |
Northside | Low Income | 35 |
Southside | Middle Income | 50 |
Eastside | High Income | 45 |
5
.
John is looking to gather data on the popularity of 3D printing among local college students to determine the potential demand for a 3D printing lab on campus. He has decided on using surveys as his method of data collection. What sampling methods would be most effective for obtaining accurate results?
6
.
A group of researchers were tasked with gathering data on the price of electronics from various online retailers for a project. Suggest an optimal approach for collecting data on electronics prices from different online stores for a research project.
7
.
As the pandemic continues to surge, the US Centers for Disease Control and Prevention (CDC) releases daily data on COVID-19 cases and deaths in the United States, but the CDC does not report on Saturdays and Sundays, due to delays in receiving reports from states and counties. Then the Saturday and Sunday cases are added to the Monday cases. This results in a sudden increase in reported numbers of Monday cases, causing confusion and concern among the public. (See the table provided.) The CDC explains that the spike is due to the inclusion of delayed or improperly recorded data from the previous week. This causes speculation and panic among the public, but the CDC reassures them that there is no hidden outbreak happening.
Date | Day | New Cases |
10/18/2021 | Monday | 3115 |
10/19/2021 | Tuesday | 4849 |
10/20/2021 | Wednesday | 3940 |
10/21/2021 | Thursday | 4821 |
10/22/2021 | Friday | 4357 |
10/23/2021 | Saturday | 0 |
10/24/2021 | Sunday | 0 |
10/25/2021 | Monday | 8572 |
10/26/2021 | Tuesday | 4463 |
10/27/2021 | Wednesday | 5323 |
10/28/2021 | Thursday | 5012 |
10/29/2021 | Friday | 4710 |
10/30/2021 | Saturday | 0 |
10/31/2021 | Sunday | 0 |
11/1/2021 | Monday | 10415 |
11/2/2021 | Tuesday | 5096 |
11/3/2021 | Wednesday | 6882 |
11/4/2021 | Thursday | 5400 |
11/5/2021 | Friday | 6759 |
11/6/2021 | Saturday | 0 |
11/7/2021 | Sunday | 0 |
11/8/2021 | Monday | 10069 |
11/9/2021 | Tuesday | 5297 |
a.
What simple strategies can the data scientists utilize to address the issue of missing data in their analysis of COVID-19 data from the CDC?
b.
How can data scientists incorporate advanced methods to tackle the issue of missing data in their examination of COVID-19 cases and deaths reported by the CDC?
8
.
What distinguishes data standardization from validation when gathering data for a data science project?
9
.
Paul was trying to analyze the results of an experiment on the effects of music on plant growth, but he kept facing unexpected data. He was using automated instrumentation to collect growth data on the plants at periodic time intervals. He found that some of the data points were much higher or lower than anticipated, and he suspected that there was noise (error) present in the data. He wanted to investigate the main source of the error in the data so that he could take corrective actions and thus minimize the impact of the error(s) on the experiment results. What is the most likely source of error that Paul might encounter in this experiment: human error, instrumentation error, or sampling error? Consider the various sources of noise (error).
10
.
The government is choosing the best hospital in the United States based on data such as how many patients they have, their mortality rates, and overall quality of care. They want to collect this data accurately and fairly to improve the country's health care.
a.
What is the best way to collect this data? Can we use web scraping to gather data from hospital websites? Can we also use social media data collection to gather patient reviews and feedback about the hospitals?
b.
How can we ensure the data is unbiased and accurate? Can we use a mix of these methods to create a comprehensive analysis?
11
.
Sarah, a data scientist, was in charge of hiring a new sales representative for a big company. She had received a large number of applications. To make the hiring process more efficient, she decided to streamline the process by using text processing to analyze the candidates' applications to identify their sentiment, tone, and relevance. This would help Sarah gain a better understanding of their communication skills, problem-solving abilities, and overall fit for the sales representative role. She compiled two lists of keywords, one for successful candidates and one for rejected candidates. After generating summaries of the keywords from each application, Sarah was ready to make informed and efficient hiring decisions. Identify the best candidate based on the list of keywords associated with each candidate, as reviewed in the figure shown. Your answer should be supported with strong reasoning and evidence.
12
.
How can a data analyst for a large airline company efficiently collect and organize a significant amount of data pertaining to flight routes, passenger demographics, and operational metrics in a manner that ensures reliable and up-to-date records, complies with industry standards, and adheres to ethical guidelines and privacy regulations? The data will be utilized to identify potential areas for improvement, such as flight delays and high passenger traffic, so that the company can implement effective strategies to address these issues.
13
.
A chain of restaurants that serve fast food needs to improve their menu by providing healthier options to their customers. They want to track the nutrition information of their dishes, including burgers, fries, and milkshakes, to make it easier for the managers to compare their calorie, fat, and sugar content. What data management tools can restaurants utilize to track nutritional data to make healthier options readily available and enhance their menu?
14
.
The NFL team's talent acquisition specialist is on a mission to find the most talented high school football player to join their team. The specialist is responsible for gathering extensive information on potential players, such as their stats, performance history, and overall potential. The objective is to pinpoint the standout athlete who will bring a significant advantage to the NFL team. To achieve this, the specialist utilizes cloud computing to sift through vast amounts of data and identify the ideal high school player who can make a valuable contribution to the team's success in the upcoming season. Why is cloud computing the perfect tool for collecting and analyzing data on high school football players?