As companies strive to become data-driven, and with the recent explosion of AI technology demanding ever-increasing amounts of training data, the quality of that data is becoming more important. And there’s a great deal of time and money invested in data pipelines and other technical aspects of data quality such as data consistency, validity, timeliness, and audibility.
But there’s one aspect of data quality that’s equally, if not more, important, and is often overlooked in favor of problems that can be solved by technology—that of completeness, or bias.
The best way to address this issue is to have as diverse a data team as possible in terms of gender, ethnicity, age, national background, education, business expertise, and more.
Data-driven companies out-perform
Over the past few years, numerous studies have shown that companies that make data-based decisions make more money. Last year, for example, an IDC survey of over 600 companies showed that mature data practices result in a threefold increase in revenue improvement, almost triple the likelihood of reduced time to market for new products and services, and more than double the probability of enhanced customer satisfaction, profits, and operational efficiency.
And a March survey of business leaders by the Harvard Business Review and Google Cloud showed that data and AI leaders significantly outperformed other companies in operational efficiency, revenues, customer loyalty and retention, employee satisfaction, and IT cost predictability.
Executives are paying attention. A global survey released this spring by Salesforce, of nearly 10,000 business executives, showed that 80% say data is critical to decision-making at their organizations, and 73% say data helps reduce uncertainty and improve accuracy.
Diversity is good for business
Further studies have shown that diversity also leads to better business performance, and that diverse teams are more innovative, make better decisions, and have higher retention. And most companies now understand the value of diversity and inclusion.
In a PwC report released this February, 85% of global companies had diversity, equity, and inclusion as a stated value or priority. Of those, 46% did so in order to attract and retain talent, 20% to achieve business results, 13% to enhance their reputations, and 11% to comply with regulatory requirements.
But few companies are able to live up to their diversity objectives, and data science is one of the worst sectors in this regard.
According to the latest Zippia numbers, only 20% of US data scientists are women. Only 7% are Hispanic, even though 19% of the US population is, and only 4% are African American, despite being 12% of the population.
“Without a diverse team, you’re less likely to be aware of different lived experiences,” says Nika Kabiri, senior director of decision science at Clio, a legal services company.
And it’s not enough for executives to commit to hiring diverse teams, she adds.
“They also need to create space for diverse voices, for individuals to comfortably share their diverse lived experiences in a way that deeply informs product development,” she says. “Otherwise, executives will only address bias in a superficial way and build products that fall short of what they could be.”
This is particularly important today, with the advent of generative AI and large language models (LLM), says Sreekanth Menon, VP and global leader for AI and ML services at Genpact, who says LLMs have a reputation for biases and hallucinations. It’s likely this is due to a concentration in the training data. For example, the models do better with English than other languages.
“Having a diverse team from different geographies can help remediate such biases,” he says. Similarly, diversity in ethnicity, gender, and other characteristics can help create more ethical frameworks for data onboarding, as well as bring in diversity of thought.
On his own team, for example, 20 to 30% come from a pure math or statistics background, he says. The rest come from other areas. “I have a bioinformatics guy working for me,” he says. “That different background helps.”
AI has the potential to amplify data bias problems, which could lead to deadly results, says Davi Ottenheimer, VP of digital trust and ethics at Inrupt, a company founded by Tim Berners-Lee to give users control of their data.
For example, he says, early image recognition systems would inhumanely misclassify Black faces, and some AI systems would label black hands as holding guns—but not white hands—due to a diversity failure on the teams building the systems.
“A lack of diversity on a team could get innocent people killed,” he says.
Alison Alvarez, cofounder and CEO at BlastPoint, a data company serving financial institutions and utilities, adds: “There are so many examples in engineering where the lack of a diverse team can lead to poor outcomes. Like when those sensors came out for people to wash their hands and they didn’t recognize dark skin. They didn’t have a diverse team building it, and they didn’t have a diverse team actually testing it.”
But there are more dimensions to diversity beyond just gender, race, or sexual orientation. Diversity can include someone’s national origin, or whether they have allergies or other health issues, Alvarez suggests.
Diversity can even include a person’s rank in a company.
“If you don’t empower people on the lower level, their observations get downgraded,” she says.
For example, the Challenger space shuttle disaster could have been prevented, since working engineers had warned about the reliability of the seals for two years, including on the eve of the launch itself. It’s easy to miss things when there’s only one set of eyes looking at data, says former Microsoft VP Gavriella Schuster. Today, she’s a founding member of Women in Cloud and Women in Technology, advisory board member of the Women Business Collaborative, a board member at Nerdio and Mimecast, and a strategic advisor at Berkshire Partners.
“A lot of times, people use data to validate their own assumptions and ignore data that doesn’t validate those assumptions,” she says. “When you have enough eyes looking at a set of data, then you tend to avoid that phenomenon.”
But where do you find those eyes?
Schuster recommends that companies look beyond people who, say, have 10 years of data science experience. “If you were only looking for people with that level of experience, you tend not to get that diverse a pool of candidates.”
Plus, data science is changing quickly, she says, and it could be a disadvantage not to have newer people on the team who might think about AI and data processes in different ways.
In fact, you might not even need a data scientist.
“What you really want is people who have some experience in organizing information and thinking through patterns,” she says. People with degrees in the biological sciences, or economics, might have the right mindset. “There are continuing education programs where you can send someone to have them learn the specific technologies they’ll use.”
Other candidates could come from other areas of the company, or other departments that use products that the data science team builds. They understand user requirements and business value, and have needed domain expertise.
“Discounting people who don’t have a computer science background or information systems background really hurts a lot of CIOs,” she says. “Because then you miss people who understand the business, or understand the industry or the vertical, and can see different information that can be brought in. I’ve seen that happen numerous times.”
She also recommends having multiple diverse candidates to choose from. If you’re looking to hire more women, have at least two women among the finalists.
“Otherwise, if you have one person, the bias that people have will naturally come out targeted against that one person,” she says.
She also recommends looking for candidates in different geographical regions, and to hire diverse talent, the interview panel itself needs to be diverse.
Finally, leaders looking for team members who have different backgrounds, and different points of view, need to look beyond their existing networks.
“People tend to have people like them in their social network,” she says. “Unless you go outside who you know, you won’t get diverse candidates.”
Forrester analyst Kim Herrington has a tip for leaders looking to broaden their networks: go on LinkedIn, find five diverse professionals in the field you need tech talent from, and follow them.
“Then challenge yourself to do this again as often as possible, following the followers until your feeds are a garden of diverse and brilliant voices,” she says.
One place to start is The Algorithmic Justice League on LinkedIn, she says. “On the ‘people’ tab, not only will you find folks of diverse backgrounds, but they’ll be smart, passionate, and driven to help you and your teams be more mindful of technology and its pitfalls.”
Despite the means to find people and skills shortages headlines, she does hear a lot of companies complaining that they can’t find anyone.
“When I hear this, I believe you,” she says. “But then I’ve just learned an awful lot about you, your network, your outsized expectations, and your potentially outdated HR systems and policies. There’s no excuse for not having diverse people in your bubbles in 2023 and beyond.”
Herrington’s top advice for CIOs is to “put your metrics where your mouth is.”
“That’s my personal advice for CIOs and CDOs looking to improve data initiatives and quality,” she says. “To do this, CIOs can work with fellow data and analytics leaders to ask, ‘How might we…’ as it pertains to measuring and communicating diversity of data teams, retention of diverse employees, number of diverse employees in data roles, candidate diversity demographics, promotion rates, inclusion and belonging levels, pay levels, diversity of leadership, and employee engagement levels.”
One way to begin is to start with data that an organization is already gathering, she says. For example, an organization might gather demographic data for its customer base or the locations it primarily serves. “Then compare your EEOC [employee] data to see where dissonance exists when viewing percentages,” she says.
Diversity attracts talent
According to Glassdoor’s 2023 workplace trends report, 74% of US workers say corporate investment in diversity, equity, and inclusion is “very important” or “somewhat important” to them when considering a new job. Young people were particularly interested in diversity, with 72% of workers under 35 saying they’d consider turning down a job offer, or quitting a company, if they didn’t think management supported diversity initiatives. And two-thirds would also turn down a job from a company with gender and racial imbalances in its leadership.
“One thing I come across in my research is that diversity on teams actually leads to all kinds of improvement in talent attraction,” says Gartner analyst Jorgen Heizenberg. “And teams with different backgrounds are more successful and more creative, which ultimately leads to higher retention.”
Looking beyond the tech
One significant benefit to getting diverse voices on a data science team is that there are more opportunities to look beyond purely technical solutions to problems.
“Data and AI are very populated with people with the same background, the same education, and dominated by a technology-centric approach,” says Heizenberg.
That’s why data teams spend the majority of their budget, time, and people on technology such as data management, data governance, and advanced analytics.
But the primary accelerator and predictor of success is the establishment of a data-driven culture.
“It’s funny that the number-one thing is often overlooked, and they spend much more time on governance, tools, and technology,” he says. “And, to a large extent, that’s the result of having the same kinds of people with the same kinds of backgrounds and experience, and it becomes very siloed.”
According to the Gartner survey, cultural challenges to accepting change are the third-biggest roadblock to success, alongside lack of business shareholder support, after lack of staff and lack of funding.
“What I’m telling clients is when they work on data and analytics, they need to balance out the technology-centered approaches with more human-centered approaches,” says Heizenberg, “and do so by building cross-functional and multidisciplinary teams.”
CIO, Data Center Management, Data Scientist, Diversity and Inclusion, IT Leadership