Dr. Mahesh S. Raisinghani

Learning Objectives

By the end of this section, you will be able to:

Identify practices and frameworks in the management of data
Interpret the dimensions and characteristics of data needed to make decisions
Identify information system planning strategies and frameworks that inform an organization’s data management practices and processes

Organizations typically use digital technology to analyze data to get the information they need to run their businesses. With the exponential increase in computing capacity and the development of artificial intelligence (AI) and large-language models, businesses today rely heavily on efficient managers using sophisticated data storage and data analytics technologies. How can you align your organization’s data management and information strategies to deliver optimal results for the greatest number of stakeholders? Future data managers will have an obligation not only to understand information and data management but also to possess the ability to extract valuable insights from data, apply these insights strategically, and align them with the organization’s goals.

In Chapter 1 Fundamentals of Information Systems, you learned that the term data refers to raw facts and figures that are processed and turned into meaningful information. Data represent discrete elements without context or meaning on their own, and data come in various forms—such as numbers, text, images, or audio. For example, a list of numbers or a collection of customer names and addresses is considered data. Information is the result of processing data through organization, analysis, and contextualization to derive meaning. If the data are a list of numbers, then the related information may be the trend of increasing sales of a product. This information can be used to make decisions, understand relationships, and gain insights.

For any organization, information is an invaluable resource, hence data and data management have become critical. Effective data management aligns with data analytics capabilities, facilitating the automated discovery of trends, patterns, and anomalies through techniques like data mining and analysis. In today’s data-driven world, there’s increasing attention to the relevance of big data, highly complex and large datasets that require specialized processing techniques to extract valuable insights and make informed decisions. Managing any data involves activities such as cleaning, extracting, integrating, categorizing, labeling, and organizing. All these activities should be executed in a manner that ensures that the quality of the data is preserved, and the data remain secure but also easily retrievable. Organizations need people to manage data and control data accessibility, and they need to define roles and responsibilities for all the people working with and extracting information from their data. Decisions about data management have long-lasting impacts, so it is important for organizations to select suitable frameworks for managing data.

Identifying Practices and Frameworks That Assist Data Management

Analysts report that by 2025 the global volume of data is expected to reach 200 zettabytes.¹ A zettabyte is a measure of digital storage capacity equal to a thousand exabytes, a billion terabytes, or a trillion gigabytes. Notably, most of these data are underutilized. For most enterprises, only about one-third of data is used for decision-making, while the remaining two-thirds is simply stored.² When analysts are unable to find the appropriate and available data for decision-making, it can cost organizations billions of dollars each year. On the other hand, data archives are often required for regulatory reasons to ensure compliance with laws governing data retention, privacy, and security. For instance, industries like health care and finance have strict regulations regarding the retention of patient records and financial transactions data.

Another important aspect of dealing with data is data governance, which involves the policies, procedures, and standards how an organization manages the availability, usability, integrity, and security of its data throughout the data life cycle. You probably have noticed that every time you use the internet, an app on your phone, or buy something online, websites and apps track what you do, and may track your location and other data. The world is full of sensors, electronic payments, tracking data, biometric information like fingerprints, and smart home devices that collect data. This kind of information is valuable, and this creates challenges for making sure data are handled well. Data governance is like a rulebook for how data are managed and making sure the right people are responsible for decisions about data.

Appropriate data management is crucial to an organization’s reputation and success. Data management parameters establish practices that organizations can use to ensure that their data are managed, protected¸ and of high quality so they can be utilized to make informed decisions. The essential areas for managing data effectively are as follows (Figure 2.2)³:

Data governance establishes policies, processes, standards, roles, and responsibilities to manage data as a valuable asset.

Data quality ensures that data are accurate, complete, and consistent. This is achieved through activities such as validation, cleansing, matching, and monitoring via metrics and reporting.

Data integration combines data from different systems and applications. It includes tasks such as mapping data elements, transforming data, and ensuring seamless integration using tools and best practices.

Data security protects data from unauthorized access, use, disclosure, disruption, modification, or destruction. It is achieved through measures like encryption, access controls, and adherence to security best practices.

Data privacy, like data security, safeguards personal data from unauthorized access, use, disclosure, disruption, modification, or destruction. It relies on the use of encryption, access controls, and privacy best practices.

Data retention involves storing data for a defined period based on legal, regulatory, and business requirements. It includes activities like archiving, purging, and the development of data retention policies.

Data architecture focuses on designing data models and database structures that align with business needs.

Data analytics involves analyzing data to extract insights and support decision-making. It includes implementing activities like data warehousing, mining, and visualization to achieve meaningful information extraction. These tools, techniques, and insights can be used in AI and machine learning applications. (George Firican, ”Data Management Framework 101”)

Data Governance (policies and responsibilities to manage data): Data Retention, Data Privacy, Data Security, Data Quality, Data Integration, Data Architecture, Data Analytics.

Figure 2.2 Data governance plays a role in all data management areas. It ensures an organization’s data are of high quality and are managed and protected effectively so that the data can be used to make informed decisions. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Future Technology

Data Management Technologies

A new approach to data management, called a data mesh, is based on the principle of domain-driven design. In a data mesh, data are owned by the domain experts who create the data, and the data are made available to other users through a self-service model.

Another powerful technique for data management is called data fabric. It involves a move toward data democratization, which means that data are made available to more people in the organization, not just those in information technology (IT) or data science roles. Through this approach, data can be accessed and consumed by anyone as long as they have proper security credentials for the level of data they desire.

Another technology is federated analytics. Federated analytics allows users to analyze data without having to move it to a central repository. This can help to improve data security and privacy, and it can also make it easier to analyze data that reside in different locations. These three techniques address the growing complexity and scale of data management in modern organizations, especially as data becomes more distributed, diverse, and decentralized. These technologies provide innovative ways to manage, integrate, and analyze data at scale, which is necessary in the age of big data and advanced analytics.

Businesses that strive to ensure they follow these data management parameters can decide to follow any framework. A framework is a structured and organized approach that provides a set of guidelines, principles, or tools to address complex problems or tasks in a systematic manner. It serves as a foundation to build and manage processes, systems, or projects, ensuring consistency, efficiency, and effectiveness. Following are some common data management frameworks:

DAMA-DMBOK: The Data Management Body of Knowledge, developed by the Data Management Association (DAMA International), provides comprehensive guidelines across ten key knowledge areas, including data governance, data architecture, and data quality.
MIKE2.0 IMM: This framework offers a structured way to assess an organization’s information maturity level.
IBM Data Governance Council Maturity Model: This model provides a road map for implementing effective data governance processes and controls.
Enterprise Data Management (EDM) Council Data Management Maturity Model: This comprehensive framework covers various aspects of data management, including data strategy, data operations, and data architecture.
Responsible Data Maturity Model: This model is an evolving concept that continues to gain importance as the role of data in our lives becomes more prominent.

There are some differences in how each framework approaches data management. DAMA-DMBOK focuses on the technical aspects of data management, while MIKE2.0 focuses on the business aspects of information management. The IBM Data Governance Council Maturity Model and the EDM Council Data Management Maturity Model are both designed to help organizations assess their current data management practices and identify areas for improvement. The best framework for an organization will depend on its specific needs and requirements. However, all five frameworks can be valuable tools for improving data management and information quality. The optimal framework for managing data is one that is continuously developing to address advancements in data storage and changes in organizational policy.

Careers in IS

Data Architect

A data architect is responsible for designing, creating, and managing an organization’s data architecture. This includes creating frameworks and structures for collecting, storing, processing, and accessing data in a way that aligns with business goals, compliance requirements, and performance needs. In addition to designing traditional data architectures, many modern data architects now also focus on frameworks like data mesh, data fabric, and federated analytics as part of their responsibilities. A data architect needs strong knowledge of database technologies, data modeling, cloud platforms, and big data tools. They must also understand data governance practices, security protocols, and compliance regulations and be familiar with data pipelines, integration tools, and extract, transform, and tools. Having strong problem-solving skills will allow a data architect to design complex systems and solve integration challenges, ensuring data flows smoothly across different parts of the organization.

Most data architects have a bachelor’s degree in computer science, information systems, business analytics, or a related field. Some also have a master’s degree, which can help give candidates a competitive edge or allow for a smoother transition from another field. There are also numerous certificates available for a data architect. Different organizations may require different certifications, but the Certified Data Management Professional certificate is one that many organizations prefer.

To become a data architect, you typically need a combination of a strong educational foundation, hands-on experience working with data systems, and certifications in specialized tools and platforms. Ongoing learning and keeping up with industry trends are also helpful in this rapidly evolving field.

Dimensions and Characteristics of Data to Inform Data Management Decisions

For each type of data, an organization needs to define rules and policies to measure the quality of the data and the effectiveness of the data’s management. There are four key dimensions of big data known as the four Vs:

Volume: The dimension of volume refers to the vast amount of data generated and stored. Big data datasets are incredibly large and can grow exponentially.
Variety: The variety dimension encompasses the diverse array of data types and formats, which might include structured elements like user IDs, time stamps, and actions taken, as well as unstructured components such as user comments and feedback.
Velocity: The rapid pace at which data are generated and collected, thereby necessitating real-time processing, is characterized by velocity. Particularly evident during high-traffic events, such as online flash sales, the constant influx of clickstream data necessitates swift analysis to enable the system to offer tailored recommendations and insights to users navigating a dynamic digital landscape.
Veracity: The veracity dimension refers to the reliability, accuracy, and trustworthiness of the data, considering factors like data quality and consistency. As data originate from a multitude of sources and are influenced by user behaviors and tracking nuances, ensuring the quality of the data is imperative to avoiding misinterpretations of metrics like bounce rates and session durations.

Different types of data are required for generating suitable information, and different types of data require different management approaches. There are two main categories of data—structured and unstructured. The first, structured data, exist in organized, fixed fields in a data repository. Structured data are typically defined in terms of field name and type (such as numeric or alphabetical). Structured data can be further categorized as follows:

Associated with operational or real-time applications, operational data include transactional data generated by day-to-day business operations. Operational data often require high availability, real-time processing, and quick access for operational decision-making. When managing operational data, the focus is on ensuring data integrity, availability, and performance to support critical business processes.
Serving as a system of record or reference for enterprise-wide use, master data represent core entities such as customers, products, or employees, and reference data include codes, classifications, or standards used across the organization. Maintaining data accuracy, consistency, and synchronization across multiple systems for these data types is necessary to ensure reliable and consistent information across the organization.
Used for data warehousing, business intelligence, and analysis-oriented applications, analytic data include aggregated and summarized data from various sources and enable in-depth analysis, reporting, and decision support. Analytic data require data integration, transformation, and optimization to provide a consolidated view and support complex analytics.

The other data type, unstructured data, do not reside in a traditional relational database; examples of unstructured data generated within organizations are emails, documents, media (videos, images, audio), slide presentations, social media activity (such as posts, images, ratings, and recommendations), and web pages. There are several challenges associated with managing unstructured data. Their large volume can make them difficult to store and manage. Unstructured data also come in a variety of formats from multiple sources, including internal sources, personal sources, and external sources. Data can also come from the web in the form of blogs, podcasts, tweets, social media posts, online videos, texts, and radio-frequency identification tags and other wireless sensors. These technologies generate data that must be managed but are difficult to process and analyze. Unstructured data are often generated at high velocity. This can make it difficult to keep up with and make sense of the data.

Despite the challenges, managing unstructured data can be valuable. Unstructured data can provide insights into customer behavior, identify trends, and improve decision-making. The management of unstructured data involves organizing, storing, and retrieving such content effectively. Techniques like content indexing, search, and metadata management are employed to enable efficient discovery and retrieval of relevant information. When managing unstructured data, it is also important to understand the decision-making priorities of an organization (which decisions require data and, consequently, necessitate data-driven information and knowledge processing). Also, it is often possible to convert unstructured data into structured data.

Good data management strategies require the following:

Updating outdated data: A data management process should involve incorporating the latest information and revisions into the dataset, thereby ensuring it remains relevant and accurate.
Rectifying known inaccuracies: Identifying and correcting any inaccuracies or errors in the data are crucial to maintaining data integrity, which means ensuring data remain accurate, consistent, and reliable over their entire life cycle.
Addressing varied definitions: Dealing with diverse definitions across different data sources requires establishing a clear and standardized data definition, which is the instruction for how to organize information. These definitions tell a computer system what kind of data to expect (text, numbers, dates) and how the data should be formatted mapped to ensure consistency. For example, when the underlying data point is the same (the person’s date of birth), the specific term used (“Date of Birth” versus “Birth Date”) might differ based on the department using the data. However, the definition of the data point (the actual date) needs to be consistent throughout the system to ensure accurate analysis and reporting.
Resolving discrepancies in redundant data sources: When multiple sources provide similar data, it is important to reconcile any inconsistencies or differences to establish a reliable and accurate representation of the information.
Managing variations in opinions, forecasting techniques, or simulation models: Recognizing and addressing divergent viewpoints, methodologies, or models used for forecasting or simulation ensures transparency and reliability in the data analysis process.

Future Technology

Master Data Management

Master data management (MDM) ensures the creation and maintenance of a single, accurate source of truth for crucial data points within an organization. This centralized data repository eliminates inconsistencies and fosters seamless information exchange across different systems and applications. Let’s explore the importance of MDM in two specific contexts: health care and e-commerce retail.

In the health-care sector, creating a unified, digital infrastructure for health-care services facilitates the seamless exchange of health-related information among patients, health-care providers, and government agencies. Efficient MDM ensures that patient data, such as medical history, medications, and allergies, are organized, stored securely, and readily accessible by authorized personnel. This both improves the efficiency of health-care delivery and reduces the risk of errors and delays in treatment.
In the e-commerce sector, consider a large online retailer managing millions of product listings. Without MDM, product information like descriptions, specifications, and prices might be scattered across different data sources, potentially leading to inconsistencies. Master data management establishes a centralized repository for product data to ensure that customers see accurate and consistent information across all interfaces: web, mobile, on-site, and in marketing materials. Additionally, MDM facilitates efficient inventory management and product updates across various sales channels.

Information System Planning and Management

Information systems play an important role in data management by transforming raw data into a strategic asset. Information system planning and management involve selecting and implementing the right technology infrastructure for data storage, processing, analysis, and governance. Information technologies and systems play a vital role in facilitating data management within organizations, as they encompass activities such as data acquisition, organization, storage, access, analysis, and interpretation. In the current landscape, characterized by advanced information systems, quick and seamless access to information has become the norm.

There is a distinction between information used for strategic planning and information used for management control. An organization’s strategic planning is its process of defining the organization’s mission, vision, goals, and objectives, and developing strategies to achieve them, often with the support of information systems, typically done by senior-level managers. In contrast, management control ensures efficient day-to-day operations and adherence to those strategic plans. The information required for each level serves a distinct purpose; hence, the data required also differ. The data need to be in user-friendly formats, enabling managers and analysts to easily comprehend and utilize the information according to their specific requirements (Figure 2.3). Managers, drawing on their expertise and experience while analyzing data, can effectively leverage these insights to address complex business problems. This fosters a dynamic and adaptive approach of leveraging collective data to drive continuous improvement and growth.

Image of puzzle pieces (Data), leading to person working on puzzle (Information), leading to puzzle pieces in the shape of an arrow (Knowledge).

Figure 2.3 Data processing for particular purposes generates information, and when applied with business acumen, information creates knowledge. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license; credit Data: modification of work “Project 365 #195: 140709 Let’s Get Jiggy With It” by Pete/Flickr, Public Domain; credit Knowledge: modification of work “Thick arrow made from jigsaw puzzle pieces” by Old Photo Profile/Flickr, CC BY 2.0)

Consider an example within the energy sector’s need for sustainable energy solutions. To plan information systems for a company in this sector, the chief information officer (CIO) must first understand what decisions need to be made and, correspondingly, what data these decisions require—strategic planning, management control, and operational control. The CIO should assess the different types of data currently available. These data can come from various sources across operations. The focus should be on ensuring existing data capture practices faithfully reflect what’s happening in the organization. Additionally, the CIO should consider how these data can be processed and transformed into actionable information that supports the identified decision needs. Finally, the CIO must assess how the different levels of planning and the different types of information fit together.

Link to Learning

Listen to this podcast about digital transformation at the energy company Shell across three levels: operational, management, and strategic.

Frameworks for Data Management Practices and Processes

In the realm of information systems, the Robert Anthony framework, named for the management accounting pioneer who developed it in 1964, helps in understanding information needs across different organizational levels. This framework categorizes decisions based on the level of certainty surrounding the problem and the solution. The Robert Anthony framework divides a problem, and by extension the data needed to resolve this problem, into three domains: operational control, management control, and strategic planning⁴:

Operational control is the process of ensuring that the organization’s day-to-day operations are running smoothly. It involves setting production schedules, managing inventory levels, and processing customer orders. Operational control is typically done by frontline managers and employees.
Management control is the process of ensuring that the organization is meeting its strategic goals. It involves setting budgets, monitoring expenses, and tracking sales. Management control is typically done by middle management.
Strategic planning is the process of setting long-term goals for an organization. It involves identifying the organization’s mission, vision, and values, as well as its strategic objectives. Strategic planning is typically done by the organization’s top management team.

By classifying information into these categories, Anthony’s framework provides a structured approach to gathering, analyzing, and utilizing information for effective planning and control within an organization. Table 2.1 features data characteristics for the three types of domains.

Domain	Data Use	Data Characteristics
Operational	Track and control day-to-day activities.	Data are typically detailed and time sensitive.
Management	Make decisions about how to allocate resources and achieve goals.	Data are typically summarized and less time sensitive than operational data.
Strategic	Set long-term goals and make strategic decisions.	Data are typically aggregated and less time sensitive than management data.

Table 2.1 Three Domains of the Robert Anthony Framework The three domains of the Robert Anthony framework—operational control, management control, and strategic planning—each work with different types of data for different purposes.

Another data management framework developed by data scientist Herbert A. Simon in 1977, the decision-making framework breaks down decisions into two types: programmed decisions with clear procedures or nonprogrammed decisions requiring more judgment. Programmed decisions are also called structured decisions as they are routine and can be made based on preestablished rules and procedures. They often involve repetitive tasks that can be automated within the information system. Nonprogrammed decisions, on the other hand, are not structured as they are unique and complex, requiring judgment, analysis, and creativity. These decisions often arise when there are no predefined guidelines or precedents available. A three-step process determines which activities are programmable and nonprogrammable (Figure 2.4):

Intelligence: Gather relevant information and assess the situation to understand the nature and requirements of the activities.
Design: Analyze the collected information to make informed judgments about whether an activity can be effectively automated or requires human intervention.
Choice: Based on the decisions made, select the appropriate approach for each activity—either programming it for automation or handling it through human involvement.

Process linked by arrows: Intelligence (Information gathering) to Design (Evaluation and analysis), back to Intelligence or to Choice (Decision and implementation), and back to Intelligence or back to Design.

Figure 2.4 To identify whether an activity is programmable or nonprogrammable, a three-step process is followed in order to come to a decision. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Another framework, developed by Gorry and Scott Morton, combines elements of both the Robert Anthony and decision-making frameworks.⁵ Gorry and Scott Morton introduced the concept of IT into their framework, acknowledging the influence of technology on decision-making and information management. They highlighted the importance of aligning IT strategic planning with an organization’s decision-making needs and the potential benefits that technology can bring to the decision-making process. Gorry and Scott Morton suggest that for each business problem that involves information systems, data managers must first determine whether the problem is strategic, managerial, or operational in nature and then see whether the decision-makers and the intelligence required should be internal or external. If both the decision-makers (stakeholders) and intelligence needed can be found internally, as in the case of order entry and budgeting, then the problem qualifies as being structured. But if both the intelligence needed and the decision-makers are external to the organization, then the problem qualifies as being unstructured, for instance, in the case of systems for cash management or personnel management.

When applying the Gorry and Scott Morton framework to review existing systems or to propose new systems that cater to user needs, an information systems designer or reviewer often investigates how available technology can support decision-making. This framework emphasizes both applications and decisions, necessitating input from a diverse range of users to understand the benefits of existing systems or expectations for new ones. It is important to gather user feedback through structured interviews and questionnaires as they can be utilized to collect user data and gauge user reactions to each system under consideration. Key inquiries may include the following:

In which directions should the development of new systems be oriented?
Are there specific demands for enhanced strategic planning applications or managerial and operational systems?
Do users want a system to support them with decisions that are less structured compared to those handled by existing systems?

Furthermore, an analysis of previous systems developed in analogous situations could offer valuable insights into the potential benefits of a novel system.

According to the Gorry and Scott Morton framework, management control primarily deals with overseeing and guiding people, while operational control focuses on the performance of designated activities or tasks, such as manufacturing a particular part. Typically, operational control systems have a limited ability to provide managerial control reports or data that are helpful for strategic decision-making. For example, an online retail system primarily focused on operational control can generate sales reports that may provide valuable data for both managerial and strategic decision-making. Similarly, an online retail system primarily focused on operational control can generate reports, such as a sales analysis report. While intended for operational purposes, such reports can also provide valuable data for both managerial and strategic decision-making. For instance, Zara has been able to use online retail system data for its supply chain management system to beat competitors in the time to market by offering new fashions to its customer base.⁶ These data can be leveraged to identify sales trends in specific regions, or product categories can inform decisions on marketing expansion, inventory management, and product diversification to drive future growth.

To conduct a thorough examination of the information systems within an organization, a simplified Robert Anthony framework can be employed. First, classify each system requirement based on the user and the type of decision it supports—whether it pertains to operational control, managerial control, or strategic planning. Then, for existing systems, consider factors such as data availability and the governance policies that regulate access by decision-makers. This analysis should help identify whether the existing data align with the intended purpose of the information systems and if any modifications or enhancements are needed. This process may involve reviving outdated systems or proposing new systems.