Go to the Students section
Go to the Staff section
Go to the Alumni section
Go to the Study here section
Go to the International section
Go to the About section
Go to the Research section
Go to the Business and Employers section
Go to the Support us section
School of Engineering, Computing and Mathematics
Faculty of Technology, Design and Environment
Artificial intelligence has dramatically changed the world as we know it, but is yet to fully embrace ‘hot’ cognition, i.e., the way an intelligent being's thinking is affected by their emotional state. Artificial intelligence encompassing hot cognition will not only usher in enhanced machine-human interactions, but will also promote a much needed ethical approach. Theory of Mind, the ability of the human mind to attribute mental states to others, is a key component of hot cognition. To endow machines with (limited) Theory of Mind capabilities, computer scientists will need to work closely with psychiatrists, psychologists and neuroscientists. They will need to develop new models, but also to formally define what problems need to be solved and how the results should be assessed.
The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR) has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV) words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs). They show that sub-lexical units outperform word units in terms of Word Error Rate (WER), Character Error Rate (CER) and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs) and Convolutional Recurrent Neural Nets (CRNNs). Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition.
This paper reports a new approach based on convolutional neural networks (CNNs), which uses spatial transformer networks (STNs). The approach, referred to as Tied Spatial Transformer Networks (TSTNs), consists of training a system which combines a localization CNN and a classification CNN whose weights are shared. The localization CNN is used for predicting an affine transform for the input image, which is then processed according to the predicted parameters and passed through the classification CNN. We have conducted initial experiments on the cluttered MNIST dataset of noisy digits, comparing the TSTN and STN with identical configurations of trainable parameters, but untied, as well as the classification CNN only, applied to the unprocessed images. In all these cases, we obtain better results using the TSTN. We conjecture that the TSTN provides a regularization effect, as compared to untied STNs. Further experiments seem to support this hypothesis.
Deep architectures based on convolutional neural networks have obtained state-of-the-art results for several recognition tasks. These architectures rely on a cascade of convolutional layers and activation functions. Beyond the set-up of the number of layers and the number of neurons in each layer, the choice of activation functions, training optimization algorithm and regularization procedure are of great importance. In this work we start from a deep convolutional architecture and we describe the effect of recent activation functions, optimization algorithms and regularization procedures when applied to the recognition of handwritten digits from the MNIST dataset. The network achieves a 0.38 % error rate, matching and slightly improving the best known performance of a single model trained without data augmentation at the time the experiments were performed.