Model risk management (MRM) teams were deeply involved in validating surveillance models...
DIY voice surveillance: why one bank chose build over buy
One of the key areas in which the hype around artificial intelligence (AI) may not be exaggerated is in the analysis of the human voice and language. The technology of transcription and translation has advanced so rapidly that real time analysis of almost all of the languages used by banks’ regulated employees is now feasible.
This means that the gaps around language and the percentage of communications actively surveilled, which were defensible when translation and transcription technology was primitive, and processing and storage were expensive, are now much harder to justify. It’s tricky to explain to a regulator why you only surveil three languages in voice if you do business in 20.
The key question then is ‘what technology should I buy, or should I build it in house?’.
In 1LoD’s recent Surveillance Benchmarking Survey & Report, the most outright dissatisfaction with current tooling was in voice surveillance solutions and 74% of respondents anticipated buying new voice technology from 3rd parties.
By ‘new technology’ these banks mean machine learning (ML)- or AI-driven tools that will transform the analysis of these transcripts away from high false-positive keyword searching to smarter natural language processing (NLP) analysis that can not only distinguish between innocent and not-so-innocent uses of the same words and phrases, but can also flag bad intentions and other forms of problematic behaviour.
These tools should also make it possible to automate the core analysis and so eliminate the need for sample-based surveillance, allowing banks to be sure that they can monitor all their voice and e-comms channels in full.

Going open-source and in-house
But not everyone is looking to buy. One major bank had spent 18 months looking to move from a manual sampling system to a more robust and scalable solution. They looked at the “three or four vendors in the space, all of whom are much-of-a-muchness” but were worried that they could not identify a clear leader. This raised the concern that, in the words of the surveillance head, “The moment when AI goes truly open could be the worst possible timing to sign up to expensive, long-term licenses. We felt that, given the availability of open-source AI, we could at least trial the idea of building something in-house and evaluate the viability of that.”
The bank identified the open-source AI tool Whisper and initially began with a pilot project to assess the viability of building an in-house solution for one region in one language – English. The results were so impressive that the bank has just rolled out the solution in English in EMEA and is now adding more languages to be surveilled.
It has also tested the solution on the key languages used in its APAC operations and is looking to roll out the solution in that region next.
The solution initially transcribes voice data and then translates it, and the translation is then sent for analysis by the bank’s existing e-comms surveillance systems. The team designed a specific voice lexicon incorporating elements of AI that performs a number of functions, including to improve the false positive ratio and also to strip out various types of voice call that do not need to be surveilled.
Unexpected benefits
This process replaces a fully manual, sampling process in which analysts listened to selected call data. The voice team has been trained in how to clear e-comms alerts and this has led to the creation of a single communications surveillance analyst team, rather than separate teams for voice and e-comms. This has had a range of benefits, one of which is in optimisation of global load-balancing. As the surveillance head says, “The project has created quite a lot of interesting incremental benefits that we did not necessarily expect at the outset.”
Lessons learned
So, what advice has this bank for anyone looking to build a voice solution in-house?
First, don’t underestimate the size of what this head of surveillance calls “a huge endeavour”. This has been an 18-month project involving regulators, senior management, in-house IT and other stakeholders. It is not simply a case of assembling plug-and-play components from OpenAI.
Second, don’t try this without a large in-house team already experienced in building tools that incorporate AI or ML. This bank has a lab team within global markets that already had significant experience in building complex AI-driven product and risk analytics from open-source components. Without this team, and its availability, the project would not have been possible.
Third, get senior management buy-in. Voice surveillance is still viewed by some institutions as something that can be left to the bare regulatory minimum – which is often a small subset of the population surveilled in e-comms. But management at larger banks now take the view that this represents a strategy of playing catch-up with regulators they view as likely to move towards a position in which anyone connected to a global banking or trading transaction is recorded.
As one surveillance chief puts it, “all the signs are there that regulators won't be happy until everyone even remotely connected to something you could describe as a transaction, whether that be secondary market, primary market, even syndicated lending, and so on, is recorded. I think as you progress down the current risk curve you get to a point where it's hard to make the argument that people shouldn't be recorded anymore. The reality is the regulators are working down a list from the easy markets-related stuff first, to the harder banking stuff.”.
Putting in a comprehensive voice surveillance system is insurance against that process and the inevitable enforcements.
Fourth, a related point: work with the regulators. In this project, the bank communicated their intentions and objectives to their key regulators and kept them on board. Surprising the regulators, especially with the questions that still remain around AI, is not a good idea.
