Artificial Intelligence
The new method of criminals in the cyber world is voice cloning using Artificial Intelligence (AI). Audio copies are used for fraud. In particular, thefts made by cloning the voices of senior executives of companies have increased.
The examples presented at the RSA Conference by Vijay Balasubramaniyan of cybersecurity company Pindrop show the extent of the fraud. According to Balasubramaniyan, if you are a CEO or a company executive and there is a lot of content on YouTube, cyber world crooks can reproduce your voice using Artificial Intelligence (AI)–based software, which can put you in danger.
Five minutes of recording is all it takes to create an appropriate and realistic sound clone, but if there’s a recording of five hours or more, this software can mislead people in ways they can’t even imagine. Regardless, this serious fraud threat is still tiny compared to phone frauds that introduce identity theft.
At the same conference, Balasubramaniyan showed an example of his company combining the voices of famous names. To increase the entertainment dimension a little more, the voice of US President Donald Trump was also emulated in the software. Using Trump’s earlier recordings, the company took less than a minute to create a US President’s voice clone. The example of Donald Trump has shown that voice fraud can also be used to mislead the public.
On the bright side, computer engineers have started to develop solutions to distinguish fake sounds. For example, Pindrop has created an Artificial Intelligence (AI) algorithm to determine fake voice recordings from authentic human voices. This software first checks how real people pronounce words and then matches the recorded audio with human speech structures.
The main problems of Artificial Intelligence (AI) are computer programming and Machine Learning (ML) for specific tasks such as knowledge, logic, problem–solving, understanding, education, programming, ability to use and communicate learning objects. It is one of the processes in which computers do a lot of work. A data type similar to a custom object classification method or custom event. Imitating people recognizes the similarities of objects and provides marketing automation over the internet.
The development of an artificial simulation of a person’s speech is known as voice cloning. The Artificial Intelligence (AI) voice cloning technology methods available today can produce synthetic speech that closely resembles a targeted human voice. The distinction between real and fake voices is often not perceived by laypeople who are not experts.
Actually, “voice cloning” is not a new topic. The voice cloning process is used to combine numerous sound recordings at specific frequencies and obtain speech again, which we know to be used in real life.
The origins of online Artificial Intelligence (AI) speech cloning applications can be traced back to using machines to synthesize sound. TTS is a decade–old technology that converts text into simulated expression and allows sound for computer-human interaction.
Previously, there were two methods for TTS;
First, “Concatenation TTS” creates a library of words and phonemes that can be strung together to form sentences using audio recordings. Although the expression is clear and understandable, it lacks the emotion and inflection of regular human expression. When using Concatenation TTS, each new speech style or language requires the development of a new voice database.
The second solution is “Parametric TTS,” which uses mathematical speech models to simplify voice development, reducing cost and effort compared to Concatenation. However, the work required to create a single voice has traditionally been prohibitively costly, and the effects are inhumane.
Today, Artificial Intelligence (AI) and Deep Learning developments increase the efficiency of synthetic expression.
Features of Voice Cloning Technology
- The software detects the accent and the timbre, intonation, rhythm, flow of words, and breath.
- Different emotions such as anger, fear, happiness, love, or boredom can be reflected in the sound.