Speech recognition and graph transformer nets
This lecture provides an introduction to the problem of speech recognition using neural models, emphasizing the CTC loss for training and inference when input and output sequences are of different lengths. It also covers the concept of beam search for use during inference, and how that procedure may be modeled at training time using a Graph Transformer Network. It is a part of the Deep Learning Course at CDS, a course that covered the latest techniques in deep learning and representation learning, focusing on supervised and unsupervised deep learning, embedding methods, metric learning, convolutional and recurrent nets, with applications to computer vision, natural language understanding, and speech recognition. Prerequisites for this module include: Modules 1 - 5 of this course and Introduction to Data Science or a Graduate Level Machine Learning.