Tammie Bolling; Angela Mitchell; Tanya Scott; Nyrobi Wheeler

Learning Objectives

By the end of this section, you will be able to:

Describe the different types of data
Define data validation and describe the different techniques for validating data
Construct a Data Table for a dataset

Data can come in a variety of forms. It can fit into specific categories, or it can be more general in nature. We can collect data in different ways, such as through instruments like a thermometer or through an open-ended survey. Regardless of the type of data and how it is collected, the information is useless without analysis. Data analysis is a powerful tool that can guide key strategic decisions. Businesses rarely make important decisions without some data analysis to support them. A key step in the analysis process is ensuring the reliability and validity of data. A business can then use various techniques to understand the information better. Microsoft Excel provides several tools to make that process a bit easier for the business decision maker.

Business analytics is one career field that uses extensive Excel skills. This field focuses on careful analysis of data to provide insights and recommendations to the management team on ways to monitor and improve performance. The soft skills needed in this field include critical thinking, analytical thinking, problem solving, attention to detail, and communication. The business analyst must dig deeper into data by asking the “why” questions and then be able to clearly communicate their findings to colleagues who may not know the terminology and techniques. The vast amount of data in today’s business world creates a high demand for these positions. Over the next decade, the number of jobs in the field is expected to grow over 10 percent with the positions commanding salaries over $80,000 per year.¹ Even if you choose to take a different path in your career, learning these skills can greatly increase your value to an organization.

What Is Data?

To analyze data, let’s examine the various forms that data can take and how to collect and organize it. In essence, data consists of facts or information that can be used for reference, performance monitoring, and/or analysis. For example, data can be collected about online purchasing habits to monitor the market share performance of select brands. A business can then analyze the data to reveal trends such as the purchasing habits of certain demographic groups.

Businesses use many methods to collect data. One common approach is using surveys, which can collect a wide variety of information. Also, with today’s technology, surveys can often reach a large audience by using email or social media distribution. Businesses also often have a lot of data stored in both internal and external databases. Internal databases consist of information a company has collected itself, such as customer demographic information, past purchases, or frequency of purchases. WorldCorp’s internal database includes key contact information for each of their customers, order sizes and frequency of orders, and payment history. External databases are collected by data providers. Often, a business can purchase this data from a provider to obtain information on market demographics, competitor products, and industry statistics. Many sources of compiled data can be accessed for free. U.S. Census data, available at the United States Census Bureau website, is one example. This site provides a lot of varied information on U.S. residents, such as education levels and median incomes. The data can be filtered based on specific criteria, such as location. WorldCorp can use this census data to identify and evaluate new locations for warehouses. They can use the data to determine if a particular city has the employee base needed to open the new warehouse and what the salary ranges in the region are (Figure 11.2).

United States Census Bureau website page for Kansas City, Missouri displays Missouri map with clickable information for Populations and People, Education, Housing, Families and Living Arrangements, Income and Poverty, Employment, Health, Race/Ethnicity.

Figure 11.2 Using census data can help you gather a good deal of detailed information about a geographic location.

Interviews and focus groups are also methods for collecting data such as customer preferences for a product or for understanding the impact of price changes, but they do not provide as much data for deep analysis. Excel is better suited for numerical information. However, some information from interviews and focus groups can be transformed into quantitative data. For example, if you ask potential customers if they would still purchase a product if the price increased by 5 percent or a certain dollar amount, you could record the number (or percentage of respondents) who replied yes and no to that question. Both methods can be time intensive and generally provide information that is more exploratory in nature.

The information gathered from interviews or focus groups can often help data analysts determine the appropriate data they might need for a more in-depth analysis. The information about a price change could then be further explored with more research to determine at what point a price increase becomes too much and the customer will no longer purchase the product.

Spotlight on Ethics

Is Your Information Safe Online?

Database marketing is a business strategy that uses information collected on consumers to better market their products or services to them. This could include collecting information from people’s online activities or buying information collected by marketing research firms on their purchasing habits. User data and information about online activities is collected at an astonishing rate.² All of this available data brings up valid concerns related to privacy and protecting your information from unlawful activities. What’s more, many people often provide sensitive information freely, such as when completing online surveys or taking social media quizzes that often are designed to determine usernames and passwords. In 2019, nearly fifteen million people were the victims of identity fraud. Some strategies to protect yourself include using strong passwords and changing them often, only visiting trusted sites, using prepaid cards for online transactions, and hiding personal information when you can. You will also want to avoid providing personal information unless absolutely necessary and be mindful of how you use the sites you visit.

Once data is collected, it can be grouped into two broad categories: qualitative and quantitative. The first category, qualitative data, is categorical information that does not include numbers, or if it does include numbers, those numbers do not have a true mathematical meaning. For example, a question on a class evaluation survey could ask the mode of delivery for the course and could be as follows: 1 – in person, 2 – online, or 3 – hybrid. The numbers in this case do not have a meaning; instead, they are placeholders to indicate the category you select. However, because Excel cannot analyze text information, converting the categories to numbers allows for some analysis of this qualitative data. Other qualitative data can be in the form of text that includes statements of opinion. For example, a customer review that reads “This product is wonderful!” is a qualitative assessment of the product.

The second category, quantitative data, involves numerical evaluations. It could include numerical expressions of opinions but also could be facts such as the cost of a specific product. Along with a glowing qualitative review of a product, a customer could also provide a rating of five on a scale of one to five. In this example, the same assessment, the customer’s opinion of the product, can be expressed in both qualitative and quantitative terms. This is similar in concept to what you might see for a product rating on an e-commerce site. A buyer can give a product five stars and add some textual comments about their experience with the product.

Quantitative data can be further classified as discrete or continuous. Information that can only take on a specific set of values is called discrete data. For example, the number of customers in a store is discrete information. You cannot have a portion of a customer. So, the number of customers in the store is restricted to certain values, in this case, whole numbers. Discrete data is counted. In contrast, continuous data is information that is measured and can take on an infinite set of values. For example, you could weigh yourself at home on a digital scale that only measures to one-tenth of a pound (i.e., 132.7 lbs.). Your doctor might have a more precise scale that will display your weight in hundredths of a pound (i.e., 132.65 lbs.). There is an infinite range to which the data can be reported, and it is constrained by the measurement device.

Data is grouped by the level of measurement represented by the information. There are four categories of measurement into which all data can be placed: nominal, ordinal, interval, and ratio. The nominal level of measurement is the lowest level of analysis. With nominal data, the information can only be placed in categories. No category is more important than another, and the categories cannot be placed in any logical order. One example of nominal data is marital status. With four possible answers (single, married, divorced, or widowed), every response can be placed into one of the four categories.

The next level of measurement is ordinal. At this level, data is placed into categories, but then those categories can be put in order. Ordinal data gives the company more information than nominal data does, but there is still not enough information for an in-depth analysis. The distance between the levels of the categories is not defined, but one category is of a higher order than another. A typical example is an opinion scale, often referred to as a Likert scale (Figure 11.3). As a company, you know that you want customers to be Very Satisfied as opposed to Unsatisfied. WorldCorp often uses this scale to get feedback on the customers’ interactions with sales agents. They can then rank the responses to determine which sales agents are ranked higher than others. But what you do not know is the degree of difference between Very Satisfied and Unsatisfied.

Survey asks How satisfied are you with… Columns titled: Very satisfied, Satisfied, Neutral, Unsatisfied, Very unsatisfied. Rows display: The responsiveness of the sales agent, The service you received, Our company overall.

Figure 11.3 A Likert scale is used to gather ordinal data that can be analyzed.

Link to Learning

Organizations often use surveys to gather data to inform their organizational and business practices. For instance, a business might send out a satisfaction survey to their clients. Or they might survey employees regarding their work preferences. Regardless of the purpose, you will need to write questions and provide options for responses that will capture exactly the information you need to answer the questions you are asking. Read this article from Survey Monkey regarding how to craft effective survey questions to learn more.

The last two levels are interval and ratio. Both allow data to be put into categories (nominal) and establish an order of the categories (ordinal), but now quantitative data is added so that the difference between each category is defined. The key difference between interval and ratio is how they treat a value of zero. With the interval level, zero acts as a placeholder rather than representing the value of zero, or the absence of something. One example is a survey question in which zero represents a neutral response. In this case, the respondent has no strong feelings either way on the question being asked. In contrast, with the ratio level of measurement, zero takes on the value of zero. If your product sold zero units this week, there was an absence of sales for the product. Understanding the differences between these levels of measurement can help you determine the best strategy to analyze the data you have collected. Figure 11.4 summarizes the levels of measurement.

Qualitative data: Nominal (Named variables); Ordinal (Named and ranked variables). Quantitative data: Interval (Named, ranked, and proportionately spaced variables); Ratio (Named, ranked, proportionately spaced variables that have capacity for true zero value).

Figure 11.4 The levels of measurement differentiate data based on the depth of analysis possible.

Categorical Data

Nominal and ordinal data make up categorical data, which is grouped into categories based on attributes or characteristics. Categorical data is considered qualitative data. The data can be represented with numbers; however, those numbers do not have a mathematical meaning. Some examples would be gender, marital status, or employment status (see Figure 11.4). Numbers can be assigned to these categories to make analysis possible with Excel, but those assignments are arbitrary. For example, you could represent full-time employment with the number 1 and unemployment with a 0. The same analysis could happen if the numbers were 2 and 4. There is no mathematical meaning behind the assigned numbers.

An advantage of using categorical data is that you can capture assessments of behaviors, opinions, and emotions. These concepts are not quantifiable, but using categorical data, you can assess and analyze these concepts to better understand the situation at hand. A disadvantage of categorical data is that deeper analysis using statistical methods may not be possible. Categorical data is often represented in a table format and graphically displayed with pie charts or bar charts. The data is most often analyzed through percentages in each of the categories of interest.

Numerical Data

The interval and ratio levels of measurement are numerical data. Numerical data is quantitative data and represents measures of variables, and it can either be discrete or continuous. The key feature is that the numbers have mathematical meaning with respect to the attribute or variable being evaluated. For example, a manager could determine the distance between two office locations in miles to determine the mileage reimbursement for employees traveling between the two sites. Other examples include the length of time waiting for a table at a restaurant, the amount of money spent by a customer at a particular shop, or the fuel efficiency of a vehicle (Figure 11.5). The advantage of using numerical data is that you can do in-depth analysis with statistics and use the data to develop predictions for future performance. This could be helpful for a business to forecast sales in the next quarter or predict expenses for the next year. Numerical data can be graphed using scatter plots and histograms. The data can be analyzed using various statistical methods. These could include mean, mode, median, standard deviation, and regression modeling.

Chart for Data splits into Categorical and Numerical groups. Categorical splits into Nominal and Ordinal groups. Numerical splits into Interval and Ratio groups.

Figure 11.5 Categorical data can be used for behaviors and emotions. Numerical data is used for more objective information such as distances or sales figures.

What Is Data Validation?

Data validation is another technique that can be used to verify data. In essence, data validation is the process of ensuring that the data values fall within a predetermined range. For example, if you know that there are only fifteen sales agents at WorldCorp, you can use data validation to make sure that the values assigned to various sales transactions in the monthly sales data spreadsheet are only whole numbers from 1 through 15. Data validation can also be used to ensure that product codes for the HDTVs sold by WorldCorp are accurately entered into the spreadsheet.

Two components are often used to ensure you are working with quality data: data verification and benchmarking. The process that ensures your data is accurate and reliable is called data verification. It is important to check the accuracy because if the data is not valid, then any analysis using that data will also be invalid, which could lead to critical mistakes if key strategic business decisions are based on the inaccurate data. WorldCorp could use data verification to ensure that product codes or prices are entered correctly into the spreadsheet used by the accounting department. You can imagine the issues that could arise if the price entered is incorrect. WorldCorp could be charging customers incorrectly for the product, which could lead to issues with both internal recordkeeping and external customer relations.

There are several approaches to checking the validity of your data. One way is to check the data that has been inputted in the spreadsheet against the original paper copy of the data. You could also use benchmarking, a tool that checks the validity of your data with similar datasets or a set standard. Benchmarking involves checking specific values such as price points or sales projections against known standards. These known standards could be from competitors or from established company goals. Sometimes, standards are set by government organizations as well. WorldCorp could use benchmarking to evaluate their product shipping speeds and costs versus that of their closest competitor. If their competitor ships products faster and at a cheaper price, WorldCorp might decide to investigate the impact of lowering their prices. Benchmarking can also be used to measure how your company’s sales are growing versus that of your competitors. Benchmarking against industry-wide data is also commonly used when assessing the performance. The benchmarking process can help a company determine where there might be gaps in performance compared with the competition or ways to enhance their position in the marketplace.

The Process and Techniques of Data Validation

Data validation techniques can be used to ensure that quality information is being used in decision making. Errors can occur when inputting data into the spreadsheet. Data can be entered in the cell incorrectly, data can be accidently duplicated, or numbers can be transposed. Through data validation, many of these common errors can be identified and then the user can correct those errors as appropriate. Data validation can also be used as a part of ongoing process improvement efforts in conjunction with benchmarks from key competitors or industry values. The tools available in Excel for data validation help ensure that your data has as few errors as possible and check business performance against internal standards or competitors.

Spotlight on Ethics

Keeping Accurate Data

Professional ethics dictates that businesses keep accurate and up-to-date information and data; this includes information they may collect with consent from clients and internal business information, as well as information they may create and consume in their interactions with external stakeholders across all platforms, including in spreadsheets where much of their financial information will likely be found. As such, quality data is accurate data and businesses can incorporate a number of ways to ensure data is recorded accurately, such as using Excel’s data validation capabilities. Data validation simply means we have a process in place to ensure the information is entered correctly and that it is appropriately checked to ensure it is both accurate and useful. For instance, businesses can ensure employees use effective and accurate data entry practices, such as shortcuts, AutoFill, and drop-down menus.

In addition, Excel can also restrict what content is input in certain cells to ensure it is correct—for instance, you can use data validation to have Excel reject an entry of 000-000-0000 as a valid phone number. In Excel, you would enter this validation criteria in data validation settings (Data tab, Data Validation, then select Settings). You can also prompt the person entering the data with an input message, giving them instructions to enter specific information in a cell.

Data Validation

The process of validating data is made easier with Excel. There are several ways to customize the validation processes to meet specific needs. One simple validation approach is removing duplicates. You access this tool through the Data tab. This function is helpful if you are unsure whether there are duplicate entries in your data.

Consider an example in which there are twenty employees at WorldCorp’s warehouse location in Portland and each of those employees has a unique company identification number. In a spreadsheet keeping track of the number of days of vacation remaining for the year for each employee, you would not expect to have more than one line per employee. For a dataset of only twenty employees, it would be easy to identify a duplicate entry. But what if your company had two thousand employees? This could take a good bit of time to manually identify duplicates.

To remove duplicates from your data, first open the spreadsheet that you want to examine. Click on the Data tab and then on the Remove Duplicates tool (Figure 11.6).

Remove duplicates is selected. Pane displays Employee and Days Left selected in Columns. Column in Spreadsheet lists two columns: Employee and Days Left. Duplicates are visible in rows 8 and 9.

Figure 11.6 The Remove Duplicates dialog box allows you to choose the data from which you need to remove the duplicates. (Used with permission from Microsoft)

The dialog box that opens has several options. First, ensure that the box next to “My data has headers” is checked if your data has a header row, as shown in Figure 11.6. Then, choose which columns you want to check for duplicates. In this case, you will check both columns for rows that have duplicate data in both. The removal of the duplicate information in this example requires checking both columns and only occurs if both columns have duplicate information. You can check only one column if desired as well. In this example, though, you choose both columns because the information is grouped by rows. Thus, the Employee number is matched up with their days of vacation left.

After you’ve chosen the data to check, click OK. Notice that employee number 2959 (rows 8 and 9) appears twice, with the same number of days left. Excel will remove one of these entries. If you had only selected the first column, Excel would delete any rows that had duplicate information in Column A, regardless of the content of Column B. For this reason, you need to be careful to ensure that your criteria will not unintentionally delete too much data. Once the tool has finished removing duplicates in the selected data, Excel will notify you of the number of duplicates that were removed from the selection and indicate how many unique values remain (Figure 11.7).

Microsoft Excel dialog box reads: 1 duplicate values found and removed; 20 unique values remain. Duplicate cells A9 and B9 are no longer visible on the spreadsheet.

Figure 11.7 The Remove Duplicates tool removed Employee 2959, who appeared twice in the data, from row 9. (Used with permission from Microsoft)

Further validation of data can be accomplished using Excel’s Data Validation tool accessed through the Data tab in the Data Tools command group. This tool checks a selection of data to determine if it falls within a specific range of numbers.

There are several options available with the Data Validation tool. You can set the tool to check the data within a certain date range, to verify if all numbers are whole numbers, or to check for all positive numbers. You can also create a drop-down list of options from which the user can choose. Additionally, you can set the tool to display an error message when a data point does not meet the criteria you have established.

How to Add Data Validation

Adding data validation means that you are requiring that a dataset meet certain specifications to be deemed valid. If these specifications are not met, the formula will not work. To demonstrate how to add data validation, you’ll go back to the example of employee vacation days.

In the spreadsheet, first select the data (B2:B21) you want to examine. Do not select the column headers when selecting the data. Only select/highlight the numerical information. Next, click on the Data tab and choose the Data Validation tool (Figure 11.8). The Data Validation window will open. Once open, there are three tabs available.

Data Validation is selected from Data Tools command group in Data tab. Pane with Settings tabs selected. Validation criteria selected: Allow: Any value, and Data: between. Column B is selected in spreadsheet.

Figure 11.8 When using the Data Validation tool, first select the data and then choose the tool from the Data Tools command group. (Used with permission from Microsoft)

The Settings tab allows you to set parameters/ranges for the data validation. Assume that each employee is allocated two weeks (or 14 days) of paid vacation per year. This simple spreadsheet shows the number of vacation days remaining for each of the employees. Make sure that all entries are from 0 through 14. Using the Settings tab, change the value box under Allow to indicate a whole number (Figure 11.9). Then, under Data, choose “less than or equal to” and enter 14 in the Maximum box. Examine the other options available under Data. You could choose to have the number fall between two values. You could select greater than, greater than or equal to, and not between two values. The many choices will accommodate most data validation needs.

Data Validation pane has Setting tab open and options selected in Validation criteria include Allow: Whole Number; Ignore blank checked; Date: less than or equal to; Maximum: 14. Spreadsheet column B selected.

Figure 11.9 Use the Allow and Data fields to change the parameters of the data validation. (Used with permission from Microsoft)

Next, go to the Input Message tab. This is used to show a message to a user of a spreadsheet when they click in a cell to add data. For example, this message could indicate that the number must fall within a certain range. This is particularly useful if you have multiple users for an Excel spreadsheet.

Finally, the Error Alert tab will display a message to the user when they have entered an invalid entry. There are three different types of errors available. The Stop error gives you the option only to retype an entry or cancel. The Warning error allows the user to add an entry that is not in the predetermined list. Finally, the Information error simply states to the user that the entry does not match the validation rules, but the entry is still allowed. When an entry triggers an error, a small green triangle will appear in the upper-left corner of the cell. Clicking on the triangle will show what caused the error. Finally, you have the option to not display an error alert at all. By selecting this checkbox, all entries are permitted.

Once the parameters and messages are established in the previous steps, click OK. Now, using those settings, Excel can check the data. Under the Data Validation tool, click on the small down arrow to the right of the button to open the additional options. Select Circle Invalid Data. You then will see all data that does not meet the established criteria circled in orange (Figure 11.10).

Data Validation button in Data Tools is selected; opens to options for Circle Invalid Data selected (other options: Data Validation, Clear Validation Circles). In spreadsheet, data in cell B8 (21) is circled.

Figure 11.10 You input the settings for validating data with data validation. Then, to do the validation, choose “Circle Invalid Data.” The circle is placed around data that does not meet the parameter of the data validation. (Used with permission from Microsoft)

You can clear the circles by clicking on Clear Validation Circles. To remove data validation entirely, select the Data Validation tool, and in the lower left, choose Clear All. This will clear the validation rules from the cells. Here are the general steps for using the Data Validation tool to find invalid data:

Highlight the desired data (include only numbers, not text).
Click on the Data tab and select the Data Validation tool.
Set the parameters for the validation under the Settings tab.
Set an input message if desired.
Set an error alert message if desired.
Click OK.
Choose Circle Invalid Data under the Data Validation tool to identify data that does not meet the parameters established in Step 3.

Data Validation List

Creating a data validation list is a convenient way to ensure that only specific values are entered into a spreadsheet. The user chooses the appropriate response from a predetermined set of values, which can be numbers or text. This assures that only acceptable values will be entered into the spreadsheet. For example, let’s add a column to the employee spreadsheet that indicates the employee’s location. Suppose the company has three different locations: North, East, and South. The management team may want to review staffing at the various locations when vacation time is used, so this column might provide helpful information for scheduling purposes.

To create a data validation list, first click in an empty cell on the spreadsheet. Then, access the Data Validation tool. On the Settings tab under Allow, choose List. In the Source, you have two options. You can input text or values into the box separated by commas (Figure 11.11a), or if you have the list already in the named range, you can reference the named range in the Source box (Figure 11.11b). By using the cell range reference, the drop-down list is dynamic and will update if you change the source list. This is often the preferred method unless it is certain that the options will remain the same.

(a) Data Validation pane displays Source: North, East, South. (b) Data Validation pane displays: Source: =Lists!$A$1:$A$3.

Figure 11.11 When establishing a data validation list, you can (a) directly enter the text or values, but it is preferred to use (b) cell references so that the drop-down list is dynamic. (Used with permission from Microsoft)

Either method will give the user a drop-down list to choose from when inputting information into the spreadsheet (Figure 11.12). If the list needs to be copied to other cells, it can be copied using the Paste Special technique (see Essentials of Software Application for Business).

Headers in a table include Employee (Column A), Days Left (Column B), Location (Column C). Cell C2 displays a drop-down button open with these options: North, East, and South.

Figure 11.12 The drop-down list will show the text from the data validation list so the user can choose from the permitted values. (Used with permission from Microsoft)

Custom Data Validation Rules

If there are specific validation needs that are not covered by the existing built-in choices in data validation, you can create a custom data validation rule. This allows you to set your own parameters for the cells. Open the Data Validation tool and in the Allow drop-down menu, choose Custom. This offers a great deal of flexibility to use formulas and functions in data validation. You can use any of the available functions for data validation, even those that look at text or characters. Using data validation to look at text or characters could help you identify product codes or city names that were spelled incorrectly.

Using the employee vacation list, let’s add the employee names to the data. Create a column called Name. To ensure that only textual information is entered into the cell, you will use the custom data validation option. Open the Data Validation tool. Select Custom. In the Formula Bar of the window, use the ISTEXT function to restrict the entries to text data only (Figure 11.13). You can also add an input message and error alert if desired. Click OK.

Data Validation pane displays Setting tab selected with Validation criteria selected: Allow: Custom; Ignore blank checked; Date: between; Formula: istext(B2). Column B is highlighted and cell B2 selected in spreadsheet.

Figure 11.13 Data validation can be set to ensure that only text is input into a cell or group of cells. (Used with permission from Microsoft)

If the user tries to enter a number in the cell and you have enabled error alerts, an alert message will appear. Other custom data validation rules include not allowing any punctuation, restricting entries to not exceed a certain total, or entering data within a specified range.

The custom data validation rule option helps to ensure the accuracy of data entered based on scenarios that are appropriate to your specific application. These options are useful when creating spreadsheets that have multiple users. Data quality issues can be mitigated by setting up validation rules or adding drop-down lists. Data validation along with other skills, such as protecting sheets or cells, can be critical to maintaining the integrity of the spreadsheet and the data analysis built from the information in the spreadsheet.

Editing Validation Rules

To edit existing validation rules, first identify the cells where data validation rules are located. To do this, on the Home tab, go to Find & Select on the far right side of the tab, and choose the down arrow. When the menu appears, choose Data Validation. This will highlight all cells that have data validation rules. To view the parameters of that rule, simply click in one of the highlighted cells. Then, go to the Data Validation tool. The tool window will appear and list the restrictions of the data validation rule. If you want to make any changes, you can make them here. However, before closing the window, ensure that you check the box next to the instruction “Apply these changes to all other cells with the same settings” (Figure 11.14). This will then apply the changes to the other cells with the same data validation rule.

Data Validation pane displays box checked for Apply these changes to all other cells with the same setting.

Figure 11.14 Be sure to select this checkbox to apply changes to all cells using the same data validation rule. (Used with permission from Microsoft)

Organizing and Presenting Data

A benefit of using Excel to organize data is the wide variety of formatting choices available for presenting the information in a professional manner. Remember the basic formatting concepts you learned in Working with Spreadsheets. Organizing the data in a table format with headers, colors, and other formatting options makes the information visually appealing and often makes it easier to understand. This is especially important when dealing with a large spreadsheet of information. When formatting data in your spreadsheet, there are a few settings that will allow you to present the data in a more professional way:

Adjust the alignment so that numbers are aligned to the right and text is aligned to the left. This allows users to easily distinguish between text and numerical information.
Add formatting to column and row headings, such as bold fonts or a larger text size. You might consider adding a border to the header row but avoid using all capital letters in the header row because it makes text difficult to read.
Filler colors can highlight your header row or make it easier to distinguish between rows of data. Too many colors, however, can distract from the professional look of the spreadsheet. One clean and simple way to distinguish rows is to use a light gray filler for every other row or column, as illustrated with the WorldCorp dataset from the previous chapter, shown here in Figure 11.15.
When choosing fonts for your data, use only one font, and use fonts that are easy to read, such as Times New Roman or Calibri. Avoid script fonts and fonts that look artistic.
Adjust column widths and row heights to make the spreadsheet more readable.
In large datasets, consider freezing the header row so that when the user scrolls, the column headings remain visible. To do this, go to the View tab, select Freeze Panes, and then Freeze Top Row. This keeps the top header row visible while scrolling.

There are many built-in settings that might be useful when formatting your data for a professional look. You can access them through the Cell Styles menu on the Home tab (see Working with Spreadsheets).

Row 1 displays headers in bold with highlighted blue background, cell borders display thicker black lines. Data in rows 2-11 is alternate gray/white backgrounds; thick border line is seen below row 11.

Figure 11.15 Built-in Cell Styles make datasets with many rows more visually appealing. (Used with permission from Microsoft)

Another way to present data in a spreadsheet is to display it in a graphic format. The type of chart or graph you should choose depends on the type of data. For example, a bar chart is appropriate for categorical data, while a line graph is appropriate for numerical data. When the data includes categories that equal 100 percent, a pie chart (Figure 11.16) is an effective option, such as with the data presented in Figure 11.15. Using a pie chart with the data, you can see the proportion of each product compared with each other. Charts will be covered in more depth later in Data Analysis Charts.

Pie Chart labeled “Products” displays different sized “slices” of the chart in various colors and percentages. At the right, a list of colors and what they represent is seen in a column.

Figure 11.16 Pie charts are an effective way to visually represent data when categories equal 100 percent.

Constructing a Data Table

To add even more functionality, you can format data to be an Excel Data Table, rather than simply a table of numbers. Using the Data Table formatting sets the data as a separate object in the spreadsheet. It is more than simply adding column or row headings. This feature allows you to easily add rows or columns, is automatically formatted to look professional, and allows you to quickly sort and filter information with a filter button at the top of the column. The table will automatically expand with the insertion of new rows or columns, and the preset formatting will carry over to these additions. Column headings are always visible without manually freezing the row, and formulas entered into cells are automatically applied to all cells in that column. Although these functions are available without formatting a set of data as a Data Table, working with the data is much easier if it is set this way.

You can create a Data Table from scratch and then add the data at another time. However, it is often easier to work from an existing spreadsheet of data to create the Data Table. To do so, open the "pivot_table_data" tab in the downloadable Chapter 11 data file workbook. There are two ways to format the data as a Data Table. For both, first highlight the data you want to convert to a Data Table. Then, press Ctrl+T, which opens a small window showing the selected range and a checkbox if the data contains a header row (Figure 11.17). Click OK. The other method is to click on the Home tab and choose Format as a Table in the Styles command group.

Create Table pane window is open. Where is the data for your table? (=$A$1:$H$18) is visible. My table has headers is checked. Entire spreadsheet is selected.

Figure 11.17 To create a Data Table, first select the columns and rows to be included in the table. (Used with permission from Microsoft)

Once you click OK, you will notice that your data looks different. There are drop-down arrows associated with each column, and there is a new tab in the ribbon: the Table Design tab (Figure 11.18).

Table Design tab is selected. In selected spreadsheet, drop-down arrow in column F header is indicated.

Figure 11.18 When data are in a Data Table, you will see a Table Design tab for formatting options. (Used with permission from Microsoft)

The Table Design tab appears when you click inside the Data Table and allows you to customize the table. You can change the color scheme, name the table, summarize the data, and refresh the information.

The drop-down arrows on each column header allow you to filter and sort the data. You can sort the data in many ways, and you can filter the data to only show selected information. You could sort this example to show certain destinations, such as Metairie, LA (Figure 11.19).

Open pane options: Sort A to Z, Sort Z to A, Sort by Color, Sheet View, Clear Filter From “Destination,” Filter by Color. All Text Filters selected (destinations listed in column H).

Figure 11.19 The drop-down menu with each column in a Data Table is a convenient way to filter information. (Used with permission from Microsoft)

To accomplish this, select the down arrow in the Destination column. Then, deselect all the destinations except Metairie. This will filter the entire table to only display those products for Metairie (Figure 11.20). You can tell that the data is filtered because the row number starts at a number other than 2 (row 1 is the header row). In this case, it starts at row 6. The other data is still contained in the table, but it does not display when this filter is applied. Also notice that the drop-down arrow changes to a funnel icon to indicate that the table has been filtered based on the information in this column. To remove the filter, click on the funnel icon and choose Clear Filter. This will remove the filtering from the table, and it will return to its original form. The funnel icon will change back to a drop-down arrow. If you do not want to filter your data, you can choose to not have the filter buttons displayed in your table using the Table Design tab in the Table Style Options command group. You can also use the Table Styles Options command group to add a total row to your Data Table.

Table Style Options command group options: Header row, Total Row, Banded Rows, First Column, Last Column, Banded Columns, Filter Button. In spreadsheet, filter button is visible on right side of cell H1.

Figure 11.20 Customize the Data Table using the options in the Table Style Options command group. (Used with permission from Microsoft)

To customize the color scheme of the Data Table, choose the Table Style Options command group in the Table Design tab. There are many color choices and designs to choose from. You can also create a custom style, perhaps with your company colors, by choosing New Table Style. A window appears that allows you to format each section of the table with various fonts or colors.

Another benefit of using Data Tables is that you can easily summarize the data using PivotTables. This feature allows you to display the data in various ways to better analyze the information. You can create subtotals and summations of various factors within the table. To use this feature, choose Summarize with PivotTable from the Tools command group on the Table Design tab. We will discuss PivotTables in PivotTables/Charts.

If you no longer want to use a Data Table object for your data, you can revert it to a regular data range. Choose Convert to Range in the Tools command group on the Table Design tab. This will keep all the formatting but will leave the data as a range rather than a Data Table object in the spreadsheet. This makes it easier to copy and paste the information into a document or a presentation when communicating the analysis to others.

One-Variable Data Tables

Data Tables allow you to change cell values and see the result those changes have on other cells containing formulas. The tables can be used to evaluate possible scenarios to help make decisions within a business. Often, Data Tables are used to understand the financial implications of decisions.

Data Tables can use one variable or two variables. The Data Table tool is part of the What-If Analysis options on the Data tab. A one-variable Data Table investigates the impact of manipulating a single cell on another cell. Imagine you are interested in looking at the impact to your company’s overall profit if the price of one product is changed. Begin by setting up a model of the current data. For this example, let’s look at WorldCorp’s 32” LCD TV that sells for $170 and costs WorldCorp $85 to make. Let’s assume that marketing and administrative costs are $5,000 annually. You can set up an initial spreadsheet that reflects the current model and an abbreviated income statement to show the company profit (Figure 11.21). The formulas are shown in column D so that you can see how the cells were calculated.

In the spreadsheet, cell C10 is selected and =C8-C9 is visible in the formula bar. The data in cell C10 is $738,920.00 (Net Income). Column D lists the formulas that are used.

Figure 11.21 For a one-variable Data Table, you will model a single variable such as Net Income. (Used with permission from Microsoft)

You are interested in seeing the impact to profit, or cell C10, in this example. The value in this cell becomes the top row of the Data Table, so repeat it below the model. Next, identify the new prices to investigate. With a one-variable Data Table, those values are placed in the column to the left of the impacted cell. In this example, put the values in column B, starting in the row below the impacted cell. At this point, you can format the values using the Currency style to be more representative of what is being investigated. Then, select the cells to include in the Data Table, including all the price-level changes being evaluated.

Now, construct the Data Table. Go to the Data tab, click the arrow next to What-If Analysis, and select Data Table from the list (Figure 11.22). When the input window appears, choose the input value (price in this example) in the Column input cell box. You only use the Column input cell for a one-variable Data Table (Figure 11.23). Notice the dotted line around the input cell and the highlighted region for the Data Table. Click OK.

(a) Data tab is selected. What-If Analysis button selected. (b) Data tab is selected. What-If Analysis button opens to options: Scenario Manager, Goal Seek, Data Table (selected).

Figure 11.22 (a) For the new data table, first choose the What-If Analysis option from the Data tab. (b) Then select Data Table from the drop-down menu. (Used with permission from Microsoft)

In spreadsheet, cell C2 with $170.00 (Price per unit) is selected. A partially blank table at lower corner is selected. Data Table pane displays Row input cell: and Column input cell: ($C$2).

Figure 11.23 The impact of the Price per unit on overall Net Income is the desired output. (Used with permission from Microsoft)

When you click OK, Excel will use the information from the model to calculate the Net Income at each of the price points (Figure 11.24).

In spreadsheet, cells B14-B18 list costs from $160.00 to $185.00 in $5 and $10 increments. An arrow points to a bracket highlighting cells C14 to C18, which list the numerical data.

Figure 11.24 The change in net income due to the change in the price per unit is automatically generated. (Used with permission from Microsoft)

Excel shows that if the price point per unit is adjusted to $165.00, the Net Income will be $695,160, as shown in cell C15. When Excel creates the Data Table, the results are not automatically formatted. You can format the information after the Data Table is created, or you can set the number style to accounting or currency for the cells C14:C18 prior to creating the table. This simple example demonstrates the value of creating a Data Table. Excel quickly calculates the impacted cell values based on the predetermined formulas. This allows the decision maker to see all options in a single table.

Two-Variable Data Tables

Often in data analysis, there is more than one variable. You may need to investigate the interaction between two variables to get more information. For example, a two-variable Data Table could analyze the performance of various locations during each quarter of the year or could examine different production levels at various price points to investigate the impact on profit.

Let’s continue with the example of product price and impact on profit, and add various quantities sold into the analysis. Again, the target impacted cell is C10, Net Income. Put the quantity values across the top of the table and the prices down the side, as with the one-variable Data Table. Format the output cells (D14:H17) as currency. Next, highlight the table area and go to the Data Table tool in What-If Analysis. For the Row input cell, choose the # units, C4, and for the Column input cell, use the unit price, C2. Click OK (Figure 11.25). Notice the output cells are formatted to the Currency style. As a manager, you can see the impact on your profit outlook when you change the two variables (Figure 11.26). For example, if the price is lowered to $165 and you sell 9,500 units, your profit would be 755,000 (cell G15).

Data Table pane displays: Row input cell: ($C$4); Column input cell: ($C$2). Cell C2 ($170.00) is highlighted. Partially blank table at bottom is selected.

Figure 11.25 Two-variable Data Tables can manipulate two cells. Use the Data Table tool to enter your input cells. (Used with permission from Microsoft)

The Data Table in cells C13 to G18 is populated with monetary values in cells D14 to G18.

Figure 11.26 The final Data Table shows the impact of changing one or both variables. (Used with permission from Microsoft)

Visualizing Mixed Variables

Data collected by businesses comes in a wide variety of forms. It may not always be numerical. It might include income or age data reported as ranges of numbers rather than an exact number. Sales data may report a purchase was made but not include the sales price. The Data Table in Figure 11.27 includes both numerical data (the number of units sold) and categorical data (the sales agent). Summarizing both categorical and numerical data in the same table is appropriate and useful in many cases. The data can also be visually presented with a combination of charts and graphs. However, for any in-depth analysis, you will need to adjust the categorical data.

A spreadsheet lists headers in Row 1 with drop down arrows at the right of every bolded header. Data in the spreadsheet is a mix of dates, text, and numbers.

Figure 11.27 Information in a Data Table can be numerical or categorial. (Used with permission from Microsoft)

One strategy to adjust categorical data is to replace the text with a numerical value. Using the data in Figure 11.27, you could replace all instances of “Antonio” with the number 1, “Izabelle” with the number 2, and “James” with the number 3. When making a substitution like this, be sure the spreadsheet includes a legend that identifies each category and its corresponding number. This could be a simple chart on the same worksheet or on another worksheet that explains what each number represents. When there are only two categories, such as a Yes/No question, you can use the binary system with 0 for No and 1 for Yes. Using the Find and Replace tool makes the process of changing text to numbers easier. Categorical data combined with numerical data can also be analyzed using PivotTables.

Footnotes

1U.S. Bureau of Labor Statistics. Management Analysts: Summary. Occupational Outlook Handbook. https://www.bls.gov/ooh/business-and-financial/management-analysts.htm. Modified September 8, 2022.
2Nicole Marti. “How Much Data Is Collected Every Minute Of The Day.” Forbes. August 12, 2019. https://www.forbes.com/sites/nicolemartin1/2019/08/07/how-much-data-is-collected-every-minute-of-the-day/?sh=30871fd03d66

11.1 Understanding Data, Data Validation, and Data Tables