The present pandemic produced videoconferencing an indispensable section of our doing the job life.

In get to enable people today, who discuss diverse languages, efficiently connect, a recent paper on arXiv.org proposes a videoconferencing remedy with stay translation captions.

Image credit: Mbrickn through Wikimedia (CC BY 4.)

There, participants can see an overlaid translation of other participants’ speech in their preferred language. The incoming speech signal is processed in a streaming method, transcribed in the speaker’s language, and made use of as enter to a equipment translation system. The scientists use quite a few capabilities to permit a improved person expertise as clean pixel-clever scrolling of the captions or fading textual content that is possible to modify.

A in depth evaluation suite is implemented to correctly compute metrics like latency, caption flicker, and precision and inspire quick progress according to these metrics.

We present MeetDot, a videoconferencing system with stay translation captions overlaid on monitor. The system aims to aid discussion concerning people today who discuss diverse languages, therefore minimizing conversation boundaries concerning multilingual participants. At present, our system supports speech and captions in 4 languages and brings together automated speech recognition (ASR) and equipment translation (MT) in a cascade. We use the re-translation technique to translate the streamed speech, resulting in caption flicker. Also, our system has really strict latency necessities to have acceptable phone excellent. We apply quite a few capabilities to enrich person expertise and decrease their cognitive load, this kind of as clean scrolling captions and minimizing caption flicker. The modular architecture permits us to combine diverse ASR and MT companies in our backend. Our system provides an built-in evaluation suite to enhance key intrinsic evaluation metrics this kind of as precision, latency and erasure. Last but not least, we present an modern cross-lingual term-guessing sport as an extrinsic evaluation metric to evaluate conclude-to-conclude system effectiveness. We strategy to make our system open-supply for exploration uses.

Study paper: Arkhangorodsky, A., “MeetDot: Videoconferencing with Stay Translation Captions”, 2021. Hyperlink: https://arxiv.org/abdominal muscles/2109.09577