Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Principles of Data Science

10.1 Writing an Informative Report

Principles of Data Science10.1 Writing an Informative Report

Learning Outcomes

By the end of this section, you should be able to:

  • 10.1.1 Identify the audience for a technical report and tailor word choice and style appropriately.
  • 10.1.2 Document and describe the methods used in a data science project.
  • 10.1.3 Determine the most effective way to present data, analysis, and results.

Data scientists should be prepared to explain the research question, the sources of data, the steps in data processing, key analytical techniques and statistical methods employed and their purposes and justifications, and finally, the outcomes of the project. They must be prepared to create and present visual exhibits (e.g., tables, graphs, charts) to support their findings. Additionally, it is important to discuss any limitations or potential biases in the data and analysis and offer actionable insights as well as guidance for further work. Such comprehensive reporting not only builds trust with the project team, but also contributes to the advancement of knowledge by enabling peer review and constructive feedback. Report writers often use a structured format, in which the report is divided into various standard parts for ease of understanding and consistency.

The main elements of a data science report include the following:

  • Title page, providing the title of the report, the names of the team members who worked on it, their affiliations, and the current date
  • Executive summary, including a brief overview of the report, the team’s key findings, and main conclusions and recommendations
  • Methodology, which discusses the data sources, data collection and cleaning methods, algorithms and models used, and assumptions made during the analysis
  • Data exploration and analysis, detailing the whole analysis process, including descriptive statistics of the data; visualizations; model building; training, validation, and testing of models; hyperparameter tuning; and model comparison and evaluation using performance metrics
  • Results and discussion, conveying the finding and their interpretation, along with strengths and weaknesses of the models, potential biases (and how they were addressed), and areas for further exploration
  • Conclusion, containing a summary of key findings, final thoughts, and recommendations

Additional parts may be included in the report, including a table of contents (TOC), introduction section, list of references, and appendices as needed. Often, those commissioning the project may require a certain format for the report, and so it is essential for data scientists to work closely with all parties to determine what format is preferred in order to create an effective and informative report that fully meets everyone’s needs.

Writing to Your Audience

Writing an informative data science report requires, first and foremost, a full understanding of the audience for which you are writing—that is, the individuals who will be reading the report. For instance, does the audience consist of fellow data science experts who can understand complex details or corporate individuals who need concise information to help them make sound business decisions? Does your audience rely more on visuals for explanation, or do they require additional detailed explanations? Perhaps your audience includes a mixture of businesspeople, technical experts, and end users, necessitating multiple formats for conveying the same information in different ways.

Start by determining the audience member types. We often talk about four audience categories: experts, technicians, executives, and nonspecialists. Each category has distinct characteristics, perspectives, expectations, and objectives, as described below.

Experts

Experts are individuals with an advanced understanding of the subject matter, often with specialized knowledge or education in the topic at hand. They are comfortable with the jargon, or specialized and/or technical terms of the field. Communication aimed at experts can be highly technical and detailed, and it often assumes a high level of prior knowledge. The purpose of such communication is usually to share new findings, discuss advanced techniques, or delve into the nuances of a particular issue.

Technicians

Technicians are practical users of information who may apply the knowledge in a hands-on manner. They typically have good familiarity with the subject but may not have the theoretical knowledge of experts. Information presented to technicians should be accurate and technical but focused on application and procedures. It can include jargon but should explain how the information affects practical work.

Executives

Executives consist of decision-makers such as managers, chief executive officers (CEOs), or business owners who may not have detailed technical knowledge but need an understanding of the implications of the information for strategic decision-making. Communication with executives should be concise, focused more on summaries and outcomes and less on details and theory. The language should be less technical and oriented toward business implications, costs, benefits, risks, and opportunities.

Sometimes, data science summaries are conveyed to executives and senior management by way of an executive dashboard. The executive (or data) dashboard is a visual representation tool designed to provide a quick, real-time overview of an organization's key data points and metrics, often updated in real time or at regular intervals. The key metrics, often referred to as key performance indicators (KPIs), allow the audience to view consolidated data from various sources in an intuitive and visually engaging format. The purpose of the dashboard is to build awareness of and highlight key patterns to support quick identification of trends, anomalies, and opportunities. A key advantage of an executive dashboard is the real-time access to critical and actionable data that allows an executive to make rapid decisions regarding business-critical KPIs. Data dashboards also allow for continuous monitoring of performance metrics and can help identify trends where early identification may result in a competitive advantage to a company. Executives can quickly scrutinize the dashboard metric to assess progress toward corporate goals and objectives.

A few examples of executive dashboards include the ability to monitor sales performance through real-time sales data, revenue trends, and customer engagement metrics, which can enable teams to adjust strategies in a timely manner. Data dashboards can also support operational efficiency by displaying key metrics such as production rates, inventory levels, and logistic statuses to assist teams in optimizing processes. Additionally, data dashboards can be leveraged by marketing teams to visualize campaign performance, audience engagement, and conversion rates to help refine their tactics and maximize returns. Overall, the dashboard can enhance agility and effectiveness in informed managerial decision-making (see Figure 10.2).

A tablet propped up on a desk with a data dashboard in view. The dashboard has a pie chart, a graph, a map, and additional rows of data. A coffee cup and phone are out of focus in the background.
Figure 10.2 The executive dashboard is a visual representation of data to give executives and senior management a quick, real-time overview of an organization's key performance indicators (KPIs).
(credit: modification of work "Analyst analytics blur" by David Stewart/https://homegets.com/Flickr, CC BY 2.0)

Government dashboards, with their focus on transparency to the public, often provide good examples.

Examples of U.S. federal government dashboards from different economic sectors include the following:

Nonspecialists

Nonspecialists, or “lay” audiences, do not have specialized knowledge of the subject but might be interested in the topic or need to understand the basics for a particular reason. Communication with this group should avoid technical language and jargon as much as possible, or at least explain it thoroughly using clear, straightforward language. The focus should be on the broader implications of the information and its relevance in a wider context.

Writing for a Diverse Audience

When writing technical data science reports, it is often necessary to connect with diverse audiences who may have different levels of expertise, interest, and background knowledge. One can build connections by tailoring communication to a specific audience category while ensuring that the explanations are clear enough to resonate with others. The presenter must consider not only the complexity of the language, but also the depth of detail, the format, and the presentation of the information. The writing must be accessible to everyone in your audience and should avoid nonessential jargon, cryptic acronyms, and excessive details (unless the level of audience requires a high level of technicality and theory). Consider the human element when creating an informative narrative. In other words, take into account the perspectives, emotions, motivations, and experiences of the audience to make the information more engaging and meaningful. It also helps to apply several illustrative examples and clear analogies to help make the case for the real-world utility of your approach and findings. When writing for a nontechnical audience, it is important to articulate the tangible impact and pertinence of the data, ensuring that it connects with the reader’s context and concerns. In the description of the various aspects of the report, be selective and avoid overwhelming the audience with data that is less significant and instead spotlight pivotal key insights. Complement the narrative with visualizations (e.g., graphs, charts, tables) that illuminate the findings. Be mindful of the diverse spectrum of your potential audience and adopt a writing style that is coherent, succinct, and interesting, avoiding excessive complexity. Employ an active voice and brief sentence construction and offer careful transitional cues to guide your readers through the report.

Writers and presenters of technical reports should also consider issues related to audience neurodiversity, or the normal variation of individual cognitive abilities, especially in realms of communication and processing information. In the same way that we must consider a range of audience from expert to nonspecialist, it is vital to recognize that cognitive and neurological differences among readers or other consumers of the report can significantly impact their comprehension and engagement with the material. Some of the same principles apply, including use of clear, concise language, structured organization, and the inclusion of visual aids such as charts and graphs to cater to diverse learning and processing styles. Additionally, providing summaries or abstracts at the beginning of each section can aid in comprehension for those who might struggle with large blocks of text or complex information. By adopting these strategies, report writers not only uphold the principles of inclusivity and accessibility but also enhance the overall effectiveness of their communication, ensuring that their findings and insights are accessible to a broader audience.

Explaining Your Methods

As described in What Is Data Science?, the steps of the data science cycle consist of defining the objective, acquiring and cleaning data from a variety of sources, analyzing data, and communicating relevant insights. In a technical report, the purpose of explaining data science methods is to provide a clear and rigorous account across the data science cycle as well as to justify the validity and reliability of the results and conclusions. However, as mentioned above, different audiences may have different roles and levels of expertise in data science. Effectively communicating with these varied audiences requires different strategies for explaining the data science methods that you employed.

One strategy used when there may be audiences of different levels or interests is a layered approach, in which the report is organized into sections or appendices that provide different levels of detail and complexity for different audiences. For example, the main body of the report could provide a high-level overview of the data science methods, focusing on the main objectives, steps, and outcomes, using simple and nontechnical language. This would be suitable for audiences who are interested in the general findings and implications of the data analysis, such as managers and policymakers. The appendices could provide more detailed and technical descriptions of the data science methods, such as the data sources, sampling methods, preprocessing techniques, algorithms, parameters, evaluation metrics, and limitations. These appendices would be suitable for audiences who are interested in the technical aspects and validity of the data analysis, such as data scientists, researchers, or reviewers.

It is important that such a report adheres to a recognized editorial style, such as one of these:

Always consult the project sponsors to find out what formatting and style are required for the report, and make sure that the report contains appropriate references and citations for any third-party sources referenced.

Some reports may require the development of a codebook or data dictionary to detail the variables, their units, values, and labels within the data or datasets used in the project. This codebook must align with the final report and the code used during the project to maintain consistency. In addition, it may be important to implement a version control system, such as Git or SVN, to help keep track of changes to code or any digital content over time, especially if the project is complex and there are many team members working on it. Git is a distributed version control system designed to handle everything from small to very large projects with speed and efficiency, while SVN (Subversion) is a centralized version control system that allows users to keep track of changes to files and directories over time. These information repositories are beneficial for tracking and managing changes to code and data throughout the lifecycle of a project. Such a system documents the project's history, assists in resolving conflicts or errors, and facilitates collaboration among researchers. Crucially, it promotes reproducibility, enabling others to access and execute the same code and data utilized in the project. In addition, individuals can utilize cloud-based systems (e.g., Google Docs, Microsoft Office 365) that provide document sharing and change tracking features that allow for advance version control and a repository of report backup options.

Should That Be a Graph or a Table?

Visualizing Data covered in depth the importance of incorporating visual aids such as graphs, charts, tables, diagrams, and flowcharts as an effective way to clarify data science methodologies. These tools can distill complex information and highlight significant patterns, trends, or relationships within the data. Data visuals can captivate and maintain audience engagement while fostering comprehension and memory retention. Data scientists should exercise sensible caution when utilizing visual aids; improper design or presentation can lead to bias or misinterpretation. Thus, each visual aid should be paired with succinct, clear explanations that clarify the purpose of the visual aid, its content, and possible limitations. The inclusion of alt text, or brief descriptions of images that accompany the image or may be embedded within the image data, is essential to aid readers who are not able to view the images and graphics due to disability or other limitations.

The art of data science communication entails presenting analytical results logically and compellingly. Graphs and tables offer distinct advantages for various data types, reporting goals, audiences, and intended messages. Graphs excel at depicting trends, patterns, connections, and anomalies, whereas tables are optimal for displaying exact values, comparisons, and classifications. Given the diversity of available graphs and tables, selecting the most fitting type for the data at hand is crucial.

A well-constructed data analysis report should include enough graphs and tables within the main body to adequately explain each of the central arguments or findings. These should be straightforward, self-explanatory, and identified with descriptive labels, titles, legends, scales, and units. Consistency in style and formatting is essential. Authors should avoid overloading the report with excessive or overly intricate visuals that might distract the reader or graphics that offer no additional insight. The author is also responsible for clarifying the significance of these visuals within the accompanying text (alt text, captions, etc.), spotlighting the principal observations or inferences. Table 10.1 provides a summary of the types of visual aids that can be used in data science reports and the types of information that they best convey (note that these displays were discussed in Inferential Statistics and Regression Analysis and Visualizing Data):

Visual Aid Informational Purpose
Line graphs Illustrate time series data or variations across continuous data
Bar graphs Compare the distributions of discrete or categorical data
Pie charts Represent parts of a whole in percentages or proportions
Scatterplots Demonstrate correlations between data from two continuous variables
Histograms Show the distribution frequencies of data from a single continuous variable
Box plots Provide summary statistics and outliers for a continuous variable or across groups
Contingency tables Display the joint frequency or counts for the data from a combination of two or more categorical variables
Summary tables Present descriptive statistics or model outputs for data from one or more continuous variables or across different categories
Table 10.1 Summary of Visuals Used in Data Science Reports

In addition, consider using interactive visualizations, which offer users the opportunity to engage dynamically with information through direct manipulation and exploration. This method leverages the power of graphical tools to transform static data into a visual dialogue, allowing users to uncover patterns, anomalies, and insights by interacting with the visualization elements themselves. By providing mechanisms such as zooming, filtering, and sorting, interactive visualizations empower users to personalize their data exploration journey, making complex data more accessible and understandable.

Example 10.1

Problem

A large pharmaceutical company is seeking to identify key health factors that are associated with coronary heart disease and diabetes, with the potential goal of identifying markets for medications based on the findings about those key factors. One source of data is the dataset diabetes_data.csv. Assuming that some preliminary analysis has already been completed, how would a data scientist address each of the following elements of the report? (Note: You may want to return to the chapter introduction to review the information about this dataset.)

  • Title Page
  • Executive Summary
  • Methodology
  • Data Exploration and Analysis
  • Results and Discussion
  • Conclusion
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.