Skip to content

Summary

Deep learning methods are machine learning methods using multiple processing layers or levels of abstraction. Deep learning algorithms are usually further characterized by having a simple and versatile structure. Specifically, deep learning is usually based on feed forward or recursive multilayer neural networks to learn a particular model. A successful application of deep learning technologies consists in selecting a good architecture for the neural network as well as an effective training procedure to learn the parameters of the network. In recent years, modeling using neural networks has emerged again very strongly thanks to the results on effective learning algorithms for deep and recursive neural networks. Other important factors of this renaissance are the availability of higher computing power and large databases. Large databases are necessary to train multilayer structures with a large number of parameters and computational resources make this process possible in a reasonable time.

 

Although its widespread use started a few years ago, and despite the difficulty of analyzing the behavior of deep learning algorithms, the impact of deep learning is already very important in areas as image, speech and text processing in both research and commercial applications. In speech recognition, for example, we have now systems based on a simple generic deep learning architecture that outperform traditional speech recognition systems based on a complex architecture with many speech-specific processing modules. This project proposes the development of new deep learning methods for speech and audio processing, exploring new applications and continuing the initial work of the research team and the international community.

 

The project includes a comprehensive work package dedicated to deep learning and four other work packages dedicated to speech and speaker recognition, acoustic event detection, voice synthesis and voice translation. In the first work package we will derive new architectures and learning algorithms, taking into account the computational cost and the scalability to large databases, while the next work packages will explore their application in speech and audio processing.