Voice Activity Detector is provided for CLARIN VLO!

The Voice Activity Detector is provided for the CLARIN VLO.  The “Voice Activity Detector” service allows users to receive a detailed analysis of the audio recording to identify periods of voice activity and the intervals between them. The results are presented as a list of periods of voice activity (or pauses) with an indication of time with an accuracy of 10 milliseconds. It is possible to obtain the list in the form of subtitles or tags for the Audacity program. Using a special button, the user can pack the processing results into an SRT file and download it. The service works only with audio recordings in WAV format. The voice activity recognition algorithm used by the service is implemented in Python with the active use of the freely distributed NumPy and SciPy libraries capabilities. In its initial form, the algorithm is fast, accurate, reliable, and undemanding to system resources, but at the same time it is noise-resistant (the last property practically does not depend on the nature of the noise), what has been demonstrated by special comparative examinations. An improvement of the algorithm made by the Speech Synthesis and Recognition Laboratory is the preliminary removal of noises. VAD allows to save resources during data transmission through the communication channel, because the interruption in speech is not digitized and not encoded, and thus “empty” packets with silence are not transmitted over the network. This is very important for packet transmission (which, for example, is transmission in TCP/IP networks), because, in addition to the data itself, each protocol of all levels of the OSI model (transport, network, etc.) appends its own service information to each packet with data; as a result of this, the packet size grows significantly. Skipping “empty” packets with small noises is an easy way to save traffic and, as a result, increases the channel capacity. For this reason, the VAD mechanism is often used along with various effective compression codecs, in IP-telephony. The details are presented  here. Direct link is here.