BNY Mellon banks on AI to improve master data

Data about who owes how much to whom is at the core of any bank’s business. At Bank of New York Mellon, that focus on data shows up in the org chart too. Chief Data Officer Eric Hirschhorn reports directly to the bank’s CIO and head of engineering, Bridget Engle, who also oversees CIOs for each of the bank’s business lines.

“It’s very purposeful because a lot of the opportunities for us around data require tight integration with our technology,” says Hirschhorn. “I’m a peer to the divisional CIOs of the firm, and we work hand-in-glove because you can’t separate it out: I can make a policy, but that alone doesn’t get the job done.”

Hirschhorn, who joined the bank in late 2020, has worked in financial services for over three decades, during which the finance industry’s concerns about data have changed significantly.

“Twenty years ago, we were trying to make sure our systems didn’t fall over,” he says. “Ten years ago, we were worried about systemic importance, and contagion. When you solve some of the more structural concerns, it all gets back to the data. We are incredibly bullish on building advanced capabilities to understand the interconnectedness of the world around us from a data perspective.”

One key to that endeavor is being able to identify all the data related to an individual customer, and to identify the relationships that link that customer with others. Banks have a regulatory requirement to know who they’re dealing with — often referred to as KYC or “know your customer” — to meet anti-money-laundering and other obligations.

“The initial problem we were looking to solve is a long-standing issue in financial markets and regulated industries with large datasets,” Hirschhorn says, “and that was really around entity resolution or record disambiguation,” or identifying and linking records that refer to the same customer.

Being able to identify which of many loans have been made to the same person or company is also important for banks to manage their risk exposure. The problem is not unique to banks, as a wide range of companies can benefit from better understanding their exposure to individual suppliers or customers.

Defining a customer with data

But to know your customers, you must first define what exactly constitutes a customer. “We took a very methodical view,” says Hirschhorn. “We went through the enterprise and asked, ‘What is a customer?’”

Initially, there were differences between divisions about the number of fields and type of data needed to define a customer, but they ended up agreeing on a common policy.

Recognizing that divisions already had their own spending priorities, the bank set aside a central budget that each division could draw on to hire developers to ensure they all had the resources to implement this customer master. The message was, “You hire the developers and we will pay for them to get on with it,” Hirschhorn says.

With the work of harmonizing customer definitions out of the way, the bank could focus on eliminating duplicates. If it has a hundred records for a John Doe, for example, then it needs to figure out, based on tax ID numbers, addresses, and other data, which of those relate to the same person and how many different John Does there really are.

BNY Mellon wasn’t starting from scratch. “We actually had built some pretty sophisticated software ourselves to disambiguate our own customer database,” he says. There was some automation around the process, but the software still required manual intervention to resolve some cases, and the bank needed something better.

Improving the in-house solution would have been time consuming, he says. “It wasn’t a core capability, and we found smarter people in the market.”

Among those people were the team at Quantexa, a British software developer that uses machine learning and multiple public data sources to enhance the entity resolution process.

The vendor delivered an initial proof of concept to BNY Mellon just before Hirschhorn joined, so one of his first steps was to move on to a month-long proof of value, providing the vendor with an existing dataset to see how its performance compared with that of the in-house tool.

The result was a greater number of records flagged as potentially relating to the same people — and a higher proportion of them resolved automatically.

“There’s a level of confidence when you do correlations like this, and we were looking for high confidence because we wanted to drive automation of certain things,” he says.

After taking some time to set up the infrastructure and sort out the data workflow for a full deployment, BNY Mellon then moved on to a full implementation, which involved staff from the software developer and three groups at the bank: the technology team, the data subject matter experts, and the KYC center of excellence. “They’re the ones with the opportunity to make sure we do this well from a regulatory perspective,” he says.

Quantexa’s software platform doesn’t just do entity resolution: It can also map networks of connections in the data — who trades with whom, who shares an address, and so on.

The challenge, for now, may be in knowing when to stop. “You correlate customer records with external data sources, and then you say, let’s correlate that with our own activity, and let’s add transaction monitoring and sanctions,” he says. “We’re now doing a proof of concept to add more datasets to the complex, as once you start getting the value of correlating these data sets, you think of more outcomes that can be driven. I just want to throw every use case in.”

Investing in technology suppliers

BNY Mellon isn’t just a customer of Quantexa, it’s also one of its investors. It first took a stake in September 2021, after working with the company for a year.

“We wanted to have input in how products developed, and we wanted to be on the advisory board,” says Hirschhorn.

The investment in Quantexa isn’t an isolated phenomenon. Among the bank’s other technology suppliers it has invested in are specialist portfolio management tools Optimal Asset Management, BondIT, and Conquest Planning; low-code application development platform Genesis Global; and, in April 2023, IT asset management platform Entrio.

The roles of customer and investor don’t always go together, though. “We don’t think this strategy is applicable to every new technology company we use,” he says.

While some companies may buy a stake in a key supplier to stop competitors taking advantage of it, that’s not BNY’s goal with its investment in Quantexa’s entity resolution technology, Hirschhorn says.

“This isn’t proprietary; we need everybody to be great at this,” he says. “People are getting more sophisticated in how they perpetrate financial crimes. Keeping pace, and helping the industry keep pace, is really important to the health of the financial markets.”

So when Quantexa sought new investment in April 2023, BNY Mellon was there again—this time joined by two other banks: ABN AMRO and HSBC.

Artificial Intelligence, Chief Data Officer, CIO, IT Leadership