In what circumstances might a data analyst choose not to use external data in their analysis

I’m about to share two ideas that could transform your business intelligence (BI).

1.  Not all data is the same.

2.  How you manage data matters.

Not convinced? Hear me out…

First, let’s unpack the idea that not all data is the same.

3 Data Dichotomies

“When collecting and analyzing business intelligence data, analysts most often focus on their organization’s internal data (70%), business systems data (59%), and structured data (58%).” (Source: The Silicon Review)

1. Internal vs. External Data

Internal data is information generated from within the business, covering areas such as operations, maintenance, personnel, and finance. External data comes from the market, including customers and competitors. It’s things like statistics from surveys, questionnaires, research, and customer feedback.

Research has shown that business analysts consider data generated internally to be more valuable. According to one survey, “About 65% of respondents rank internal data as more important than data collected outside the company.”

Both kinds of data are helpful. Internal data helps you run your business and optimize your operations. External data helps you better understand your customer base and the competitive landscape. You need a clear view of both to have truly insightful business intelligence.

2. Structured vs. Unstructured Data

Structured data is considered more traditionally as BI, because it’s quantifiable. It’s easier to put in a database, search, and analyze. Conversely, unstructured data is considered a newer type of data. It’s not pre-defined and is typically text-heavy information, such as that from social networks or customer comments.

As explained by Dean Abbott, Co-founder and Chief Data Scientist of SmarterHQ, “Newer types of data are more difficult to use because that data isn’t in a user-friendly form. All the old-school data is in a structured form, so you can put it in the database, apply algorithms, and get value from it much quicker.”

Too often, business leaders accustomed to data snapshots default to structured data to make decisions. This can be shortsighted. A strong BI plan accounts for unstructured data to leverage the insight found there.

Here’s an example from our business. Each year, we do an account review with our clients to see how the software we have built for them is performing. We use structured data to make sure things are working right technically, and then ask a lot of questions to collect unstructured data about how the solution is working for business. We use both types of data in conjunction with our clients to identify challenges to address and opportunities to embrace. The strategy for our clients going forward is far more valuable because it is built on both structured and unstructured data.

In what circumstances might a data analyst choose not to use external data in their analysis

3. Historical vs. Real-time Data

It’s a real BI dilemma: do you look at the present at the expense of the past, or do you spend so much time on last month’s numbers that you don’t see the data for today?

The answer? Do a mixture of both. Companies often spend significant time using historical data to identify and predict trends. But without real-time data to compare it to, the value of that historical data is limited.

A blog post from ArcherPoint Retail gives a great practical example of the relationship between historical and real-time data…

“If sales are down a certain percent today, you’ll want to be tracking that in real time. But, before you start panicking, look at the historical data to see if this dip is a natural progression in your sales or if it is an anomaly. Isolated events can occur, but often times you may see a clue in your data. If you have the historical data already analyzed, you’ll be able to easily detect how sales should react in the future.”

Despite the value of real-time data analytics, a 2015 article reported that a study of 235 US based firms found that only 25% are embarking on real-time predictive modeling, and another 47% are actively exploring the opportunity. The remaining 28% still sitting undecided on how to proceed.

Real-time data can help you make a pivot when a problem or opportunity comes up. But if you react to every blip on the radar, your business will never have long-term success. You need to balance both kinds of data to make intelligent decisions.

In what circumstances might a data analyst choose not to use external data in their analysis

Source

Now, let’s turn the page to explore how managing data matters.

3 Keys to Managing Data Well

“BI and analytics requires an extensive knowledge and understanding of how strategic and other organizational decisions are made, and how this can be improved to support growth and value creation in a company.

Different solutions and data strategies support different decision structures. Strategic analytics/BI solutions can be extremely empowering for employees, and create value by informing decision making if it fits the structure. It can be equally as dangerous if there is a misfit between company strategy and data strategy.” (Source: Professor Derrick McIver via Blue Granite)

1. Start at the end

When deciding how to organize your data, it’s essential to first clarify the objective of BI for your company. What’s its purpose? What do you want it to achieve? The last thing you want to do is create a structure that doesn’t give you the intelligence you need. Start with the end in mind and work back from there to structure your data with the desired results in mind.

2. Customization increases power

BI dashboards are data visualization tools that consolidate and display key performance indicators and other metrics in a single screen. Typically, the dashboard summarizes data at a high level so it can be processed at a glance, while offering the option for you to drill down to deeper levels.

Yet, not all departments in your company need to see the same information, so it’s important to organize your data in a way that allows for customization. “Self-service dashboards,” as they are called, allow users to choose their own KPIs, blend structured and unstructured information themselves, and see different visualization options. Not only does this increase the power of the data itself, but it also takes some of the pressure away from IT departments—which have traditionally been expected to create specific reports on request.

However, there are some concerns to self-service dashboards – which leads you to my last point…

3. Put policies in place

One of the most obvious concerns with self-service dashboards is data quality. If users pull the wrong data, they’re going to get the wrong result. And if they’re using the wrong result to make business decisions, that’s a problem. Likewise, you must be confident that the data sources you’re working with are correct and complete, and that data that is proprietary and/or private remains secured.

These issues are why data governance is so important. Data governance is a set of policies and procedures that protects the integrity of your business’s information. It assigns accountability to ensure data remains secure, accurate and usable.

Equally important is training your employees in how to use BI safely and effectively, to ensure… * They feel comfortable using a dashboard—because if they don’t, they likely won’t. * They pull accurate data, in the correct way, to achieve an accurate result.

* They follow rules designed to keep your data secure internally and externally.

A successful data strategy turns a company’s data into important insights and financial gains — but it shouldn’t stop with information from inside the firm. From historic weather information to customer preferences to shopping trends, organizations have access to a vast number of data sets outside their walls, some for a price and others for free, that can sharpen analytics and ultimately boost the bottom line.  

A 2018 MIT Sloan Management Review data and analytics report found that the most analytically mature organizations use more data sources, including data from customers, vendors, regulators, and competitors. “Analytical innovators,” or companies that incorporate analytics into most aspects of decision-making, are four times more likely than less mature organizations to use all four data sources, and are more likely to use a variety of data types, including mobile, social, and public data.

And organizations that share their own data with customers, vendors, government agencies, and even competitors report increased influence in their business ecosystem, the survey found.

“In all sorts of areas, people are using third-party data to augment the data they already have,” said Asif Mahammad Syed, the vice president of data strategy at the Hartford Steam Boiler Inspection and Insurance Co. “In most cases, you can’t build high-quality predictive models with just internal data.”

Companies use this external data to augment their decision-making, better meet customer needs, predict supply and demand, and more.

Experts from MIT Sloan and elsewhere shared successful strategies for making the most of external data.

Know the external data landscape

The practice of bringing in external data isn’t new, but many companies have focused their data strategies thus far on making use of structured and unstructured data within their organizations. In a 2019 Deloitte survey, 92% of data analytics professionals said their companies needed to increase use of external data sources. 

“Digital is giving you many more sources of data,” said Stephanie Woerner, a research scientist with the MIT Center for Information Systems Research. New sources include social media data, information from Internet of Things-enabled sensors, and even fingerprint data.

Some companies have business models that rely on untraditional external data sources, Woerner said, citing the fintech company Kabbage, which uses data from social media, sales, shipping records, and more to help determine the creditworthiness of small businesses.

The insurance industry has always been data-driven, Syed pointed out — insurance companies have a strong interest in finding out information about risks they are taking and pull in data like credit scores and historic weather information. “These things were happening, but it sped up in the last decade or so,” he said.

COVID-19 has likely accelerated that trend as firms try to anticipate consumer demand in a changing, unstable landscape. As internal data became less of a reliable indicator, companies turned to external factors to predict consumer demand, supply availability, and the future course of the pandemic.

Know your options

Types of external data vary. Syed noted that high-level data is available for free at data.gov, which has more than 200,000 data sets from a variety of government agencies. “That's really an exciting place to start, because the government gives away so much free data, and it can actually help you to predict a lot of things,” he said.

His company, Hartford Steam Boiler, is part of the global company Munich Re, which has a “data-hunting team” that looks for data sources outside the company for a given use case, and then helps acquire, clean, and incorporate the data within the company.

There is a host of other information available, including regulatory data, property data, weather data, telematics data, which includes information such as how fast a vehicle is being driven, and even information about people’s emotional states. Image-based data is also available —  satellite or drone images after a catastrophic event can help an insurance company direct resources before people can visit in person, Syed said.

“The benefit of using external data is so great that there are businesses built around gathering this data, consolidating it, cleaning it, and packaging it up for use by other companies,” said Angie King, PhD ‘15, an analytics innovation senior principal with End-to-End Analytics, which is part of Accenture. Accenture engages with more 300 external data providers and uses relevant external data to enhance clients' own internal data when building analytical models for them, she said.

Depending on need and their own tech savvy, companies may opt to gather the data themselves, King said, or they can reach out directly to data providers, or engage with a firm that can help them through the data selection and model building process.

Use external data to know customers better

With third-party data and information, companies can avoid asking customers to provide too much information or fill out multiple forms, Syed said. “A customer’s attention span is very limited,” Syed said. “If you ask too many questions, they say, ‘Oh, I'll go to someone else.’” For example, a customer looking for an insurance quote can provide their name and address, and companies can use external sources for further information about the customer, their home, or their car, equipment, or machines.

Dwane Morgan, MBA ‘08, the director of global consumer insights at Under Armour, calls this the “Amazon effect.”

Amazon customers are used to the company tracking certain information — for example, when Amazon sends a reminder that it’s time to purchase something, most customers are used to the company’s practices and don’t stop to wonder how it knows that they’re running low.  

“The challenge is getting to the point that you're being useful enough with the recommendation based on the data,” Morgan said. “If you run into a situation where you cause the consumer to pause and think, ‘Well, how do they know this?’ and it feels like something that's not useful, they might delete the app or whatever they think is allowing you to track that data.

“There's a balance between making sure you're being prescriptive, so you can be helpful, but not overstepping in a way that feels intrusive.”

Use external data to add real-world context to internal decision-making

Businesses often need to know what’s going on with other companies and predict the impact of outside factors, from industrywide purchasing trends to pandemics and other catastrophic events.

“One of the most classic use cases for external data is to understand the major impact that external events can have on demand for a company's products,” King said. For example, a company’s sales could dip if a competitor’s product goes on sale, or product sales could go up because of social or traditional media, like a celebrity mention on Twitter, a viral video, or a shoutout from a talk show host.

King said using external data can improve a company’s predictive analytics and machine learning models. “Without having external data capturing these events, the predictive models wouldn't be able to infer the reason for the resulting spikes or drops in sales,” she said.

Beyond predicting their own sales, companies use social media and survey data to assess consumer sentiment and guide product decisions, King said. Companies also scrape competitors’ websites to track inventory for products with limited availability (such as event tickets or airplane seats) and to inform their pricing. 

Use external data with care

Using third-party data sources can raise concerns about protecting privacy, avoiding biased or inaccurate data, and using data for the right purposes, Syed said.

92%

In a survey, 92% of data analytics professionals said their companies needed to increase use of external data sources.

“Data has an incredible power to change the outcome of whatever you're trying to do. If your data is not trustworthy, then the business decisions that you make from that data are going to be wrong as well,” Syed said. Vetting any data coming from outside the company is important, he said. Companies should know how the data is collected and how ethically the company gathering data manages its business.

Morgan agreed. “It's not all about price. I think a lot of it comes down to the comfort level you have with that team and their ability to pull in the data accurately,” he said.

Establishing a lasting relationship with data companies is also key. “Just like any type of data, the more you have, the more you can learn from it,” Morgan said. “The longer you're with that partner, the better it is for you.”

For example, data from 2020 alone would show impacts that were unique to that unusual year. “Unless you have multiple years of data to look at and compare, you don't really know what's a seasonal impact, what was just the COVID effect, and what's normal.”

Changing vendors could mean slight changes in methodology, he added, which makes it hard to compare and contrast data over time.

Above all, being able to trust the data is vital. “I always say, scrutinize the data first,” Morgan said. “The first thing you want to do is make sure you scrub the data and that you're fully confident in what the data is saying.”

Remember that external data is only successful as part of a centralized digital strategy

 Anyone can buy third-party data, Syed pointed out — if a company’s strategy relies on data gathered outside the firm, other companies could have the same insights. “You really need to have your internal data, or have some special process where you're synthesizing the external data and making some intelligence that is specific to you,” he said. “Data by itself doesn't produce that intelligence. Data has to be put in the context of the business outcome to give you a valuable insight.”

In most cases, you can’t build high-quality predictive models with just internal data.
Asif Syed Vice President of Data Strategy, Hartford Steam Boiler

Most often, internal data is what ends up differentiating a company from competitors.

“Buying third-party data has to be seen in a broader context of what the business is trying to achieve with internal data,” Syed said. “If you don't have a solid understanding of all the internal data that you have in your entire company, then buying third-party data probably wouldn't help you that much. You'd be spending a lot of money, and third-party data doesn't come cheap.”

Companies should also coordinate data acquisition so different departments don’t purchase the same data, he said.

“You have to have a reason to bring [external data] in,” Woerner said. “You have to have some way of connecting it to what you do to have it be a useful type of exercise.

“If you can't get the small data done right, then you're not going to get the big data. You don't want to go willy-nilly collecting sources of data, because then you're going to have this tangle,” Woerner said. “It’s really a matter of, can you ask the right question and then find the right data source?”