Problem Set B
1
.
During the COVID-19 pandemic, testing for active COVID infection using nasal swabs increased significantly. Examples of the collected features are name, age, sex, and symptoms (cough, sore throat, loss of smell and taste, fever >100.2°F, shortness of breath, rash, headache, and congestion). Create a data dictionary table with the following attributes:
- Name: attribute name
- Definition: describe the attribute
- Data type: attribute data type
- Possible values: the possible values for the attributes
2
.
Explore the concept of the age of data by researching an example of a piece of data that changes rapidly and a piece of data that does not change over the course of time. Provide a suggestion on how to use the data and determine if the data are not relevant and no longer accurate.
3
.
Explain the difference between OLTP and OLAP using a practical example
4
.
Suggest criteria for selecting an open-source versus a commercial database system.
5
.
A food delivery service wants to gather the personal information of new clients and uses an online form in which all fields are required to be filled. Which record organization technique would be preferred for this purpose? Why would it be the best choice?
6
.
Select a key-value store of your choice and implement a sample application using it.
7
.
Discuss four approaches to deal with slowly changing dimensions in a data warehouse. Can any of these approaches be used to deal with rapidly changing dimensions?
8
.
Discuss some application areas where the usage of streaming analytics (such as provided by Spark Streaming) might be valuable. Consider X (formerly Twitter), but also other contexts.
9
.
If Spark’s GraphX library provides a number of interesting algorithms for graph-based analytics, do you think that graph-based NoSQL databases are still necessary? Explain why or why not. Search the Web on how to run Neo4j together with Spark and explain which roles they both serve in such an environment.
10
.
Design a sample end-to-end informatics solution for an area of your choice. If you have time, implement a simple prototype of your solution as a web-based application using a data management infrastructure of your choice.