NEW DELHI : Atul Rai, co-founder and CEO of Gurugram-based Staqu Technologies, follows the tender for a smart city project in Lucknow for audio and video surveillance to improve security.
Rai already has a product called Jarvis, used by the Uttar Pradesh Police and other state police forces, which features closed-circuit cameras (CCTVs) and artificial intelligence (AI)-based facial recognition.
In his new installment, Jarvis not only uses cameras to watch crime, but also microphones to hear what’s going on around town. “We used audio analysis to detect incidents such as prison fighting in Uttar Pradesh. Our goal is to implement it in smart cities,” Rai said. The audio analysis tool is also used by retail and manufacturing organizations to detect emergency sounds and accidents.
Staqu is one of the few companies in India offering AI-based audio analysis tools. These systems can recognize sounds such as gunshots, a person’s scream, or specific words that indicate distress. They use convolutional neural networks (CNNs) to identify noise types. CNNs are typically used for image and video recognition, but here they are used to recognize patterns in sounds. Potentially, an audio surveillance system should be able to alert the nearest hospital if an accident occurs, or contact the police if a group of people is planning a crime. “Each camera is able to send audio data via a microphone. If a crime is committed outside of this camera’s field of view, audio can help determine if someone is distressed and needs assistance,” Rai explained.
According to Rai, there are many ways to use audio analytics for security. One is to identify a scene by audio, such as B. fight, violence or screaming. Another is to identify a person by their voice when they are not facing the camera. It can help identify people with criminal records by their voice, even if they’re not in prison.
According to Rai, the Lucknow Smart City project has expressed interest in an audio and video solution and demos will be held soon. Jarvis is “language agnostic” and looks for specific sound symbols that can indicate distress or an accident, Rai said.
According to Rai, Jarvis’ accuracy was tested using VoxCeleb – one of the largest audiovisual human speech datasets. He claimed the system was 98.7% accurate. The company is also working on a new NLP (Natural Language Processing)-based feature that will allow users to ask Jarvis for information, prompting Jarvis to scan data across all cameras.
The use of audio symbols or voices for law enforcement has gained traction around the world. In Europe, Interpol developed a speaker identification solution to identify criminals using voice samples as early as 2018, while police forces in the US have reportedly built databases of criminals’ voice samples.
However, such solutions come with significant privacy concerns. Pam Dixon, founder and executive director of the World Privacy Forum, a public interest research group, warns that “a lot will depend on how the system is set up, implemented and used”. Technical bias and accuracy, there will be questions about where records are kept and for how long.” “These types of surveillance systems need to be transparent and say clearly what words and sounds are being listened to.” The policies for those systems need to be in place before they are built and used,” she emphasizes.
NS Nappinai, Counsel for the Supreme Court, agrees: “India does not have a regulatory framework for CCTV cameras, which already exist in several countries. The same rule applies to audio, so stakeholders know what’s acceptable and what’s not.”
Don’t miss a story! Stay connected and informed with Mint. Download our app now!!