Go to the Study here section
Go to the International section
Go to the Research section
The group's research projects span computer vision applications such as action and gesture recognition and image segmentation (in close collaboration with the Computer Vision group), identity recognition from gait and pose estimation; machine learning, with a focus on efficient nearest neighbour classifiers; robotics and autonomous navigation (within the framework of the Intelligent Transport Systems doctoral training programme); the modelling of chaos in dynamical systems; uncertainty theory and imprecise probabilities, with a focus on the theory of belief functions. Project themes include belief functions and imprecise probabilities, computer vision, machine learning, nauroinfomatics.
Multi-sensor fusion for simultaneous localization and mapping on autonomous vehicles
Although many different sensors are nowadays available on autonomous vehicles, the full potential of techniques which integrate information coming from these different sensors to increase the ability of autonomous vehicles of avoiding accidents and, more generally, increase their safety levels remains untapped.
Navigation techniques in static environments based on fusion of multiple sensors are well known, but it is not clear how such methods can cope with more realistic, dynamic environments which may include people, other moving vehicles, or mutating environment features. The problem goes under the name of Simultaneous Localization And Mapping with Moving Objects Tracking (SLAMMOT).
Although several approaches were proposed in literature, so far they do not appear to be able to exploit the availability of multiple heterogeneous sensors. The aim of this project is therefore to investigate solutions to this problem that combine the (potentially conflicting) information coming from an array of heterogeneous sensors in order to accurately localize the vehicle and estimate the (evolving) configuration of the surrounding dynamic environment.
Proposal support documentation PDF
Research team: Dr Fabio Cuzzolin, ReaderMichael Sapienza, Research Associate
Start date: August 2011
Tensor modeling for identity recognition from gait and activity recognition
Biometrics such as face, iris, or fingerprint recognition for surveillance and security have received growing attention in the last decade. They suffer, however, from two major limitations: they cannot be used at a distance, and require user cooperation. For these reasons, originally driven by an initiative of US's DARPA, identity recognition from gait has been proposed as a novel behavioral biometrics, based on people's distinctive gait pattern.
Despite its attractive features, though, gait identification is still far from being ready to be deployed in practice, as in real-world scenarios recognition is made extremely difficult by the presence of nuisance factors such as viewpoint, illumination, clothing, etcetera. Similar issues are shared by other applications such as action and activity recognition.
This projects concerns the problem of classifying video sequences by attributing to each sequence a label, such as the type of event recorded or the identity of the person performing a certain action. It proposes a novel framework for motion recognition capable of dealing in a principled way with the issue of nuisance factors in both gait and activity recognition. The goal is pushing towards a more widespread diffusion of gait identification, as a concrete contribution to enhancing the security levels in the country in the current, uncertain scenarios. However, as the techniques devised in this proposal are extendable to action and identity recognition, their commercial exploitation potential in, for instance, video indexing or interactive video games is also enormous.
Alternatively, simple linear dynamical models such as autoregressive or ARMA models can be employed to describe the dynamic of the walking gait, drawing inspiration from the impressive results obtained when representing dynamic textures in this way. Regardless the class of model used to represent each individual sequence, a tensorial model can be built from a training set of walking gaits and used to later classify new videos. The overall framework is illustrated below.
Research team: Dr Fabio Cuzzolin, Reader Dr Wenjuan Gong, Research Associate
EPSRC first grant: £122,000 Start date: August 2011
Action and activity recognition lies at the core of a panoply of scenarios in human machine interaction, ranging from gaming, mobile computing and video retrieval to health monitoring, surveillance, robotics and bio-metrics. The problem, however, is made really challenging by the inherent variability of motions carrying the same meaning, the unavoidable over-fitting due to limited training sets, and the presence of numerous nuisance factors such as locality, viewpoint, illumination, and occlusions that make real-world deployment a still distant perspective.
The most successful recent approaches, which mainly classify bags of local features, have reached their limits: only understanding the spatial and temporal structure of human activities can help us to successfully locate and recognize them in a robust and reliable way. We propose here to develop novel frameworks for the integration of action structure in both generative and discriminative models, pushing for a breakthrough in activity recognition that would have enormous exploitation potential.
Part-based discriminative models for action recognition
Current state-of-the-art action classification methods extract feature representations from the entire video clip in which the action unfolds, however this representation may include irrelevant scene context and movements which are shared among multiple action classes. For example, a waving action may be performed whilst walking, however if the walking movement and scene context appear in other action classes, then they should not be included in a waving movement classifier.
In this work, we propose an action classification framework in which more discriminative action sub volumes are learned in a weakly supervised setting, owing to the difficulty of manually labeling massive video datasets. The learned models are used to simultaneously classify video clips and to localise actions to a given space-time subvolume.
Each sub volume is cast as a bag-of-features (BoF) instance in a multiple-instance-learning framework, which in turn is used to learn its class membership. We demonstrate quantitatively that even with single fixed sized sub volumes, the classification performance of our proposed algorithm is superior to the state-of-the-art BoF baseline on the majority of performance measures, and shows promise for space-time action localisation on the most challenging video datasets.
Research team: Dr Fabio Cuzzolin, Reader Prof Phil Torr, Co-Investigator
Joint EPSRC Grant Start date: October 2012
Action and activity recognition are intuitive but extremely difficult tasks, which lie at the root of a panoply of scenarios of human machine interaction, ranging from gaming, mobile computing and video retrieval to health monitoring, surveillance, robotics and biometrics.
The problem is made challenging by the inherent variability of motions carrying the same meaning, which in turns causes over-fitting issues due to the forcibly limited size of any available training set, but also by the presence of numerous nuisance factors such as locality, viewpoint, illumination, occlusions, and many more. While recent bag-of-features-like approaches have focused mainly on classifying histograms of features extracted from spatio-temporal volumes (completely ignoring, in this way, the causal structure of motions to recognize), dynamics can provide invaluable information in order to successfully locate and recognize actions and complex activities in a robust and reliable way. Traditional generative dynamical models, however, have proved unable to cope effectively enough with some of these difficulties.
We propose here to design and test novel frameworks for the integration of action dynamics in both generative and discriminative models, with the aim of stimulating a breakthrough in activity recognition, with enormous exploitation potential in the manifold scenarios indicated above. Novel classes of generative graphical models based on the principle of allowing the probabilities defining the model to vary within whole convex sets rather than assuming sharp, precise values are formulated in order to address the issue of overfitting due to the limited size of the training sets.
New manifold learning techniques are applied to generative graphical models in order to tackle the presence of numerous nuisance factors. Finally, a new class of discriminative, part-based models originally developed for object recognition are generalized to action localization and recognition as a fundamental way of coping with complex activities formed by series of simple actions, and addressing the issues of locality and presence of multiple actors.
Research team: Dr Fabio Cuzzolin, Reader
Partners: Gent Universiteit, Belgium (G. de Cooman) IDSIA, Switzerland (A. Antonucci) SUPELEC, France Dynamixyz, France
European Union Framework Programme 7 Call 9 Strep
Start date: August 2012
Current text-based browsers produce decent results only for basically useless queries, while are disastrous when coping with complex ones.
Millions of videos are captured by people on their cell phones and posted on Facebook or YouTube. Searching for a video over the internet, however, is still a frustrating experience. Current text-based browsers produce decent results only for basically useless queries, while are disastrous when coping with complex ones.
What we need is to robustly analyse what happens in the video, who is involved and where, and look for videos with a similar semantic content. This involves the automated production of a “story line” for each video, and the management of queries and plots in a flexible way, to handle the many sources of uncertainty that make the problem so challenging.
Research grant application PDF
Research staff:Dr Fabio Cuzzolin, Reader
Providing sensible predictions in many scenarios such as climate change, rare events contingency planning, or disaster risk analysis is difficult, since the available data, normally in the form of a time series, is either scarce, incomplete, or even missing.
In such cases, "cautious" approaches to uncertainty modeling have an edge over classical methods, as they are designed to come up with robust (albeit imprecise) predictions when data are lacking or partial. However, "imprecise" estimation and decision making require mathematical tools only partially developed yet, due to the more complex mathematical behaviour.
In particular, the generalization of classical total probability has a crucial role in the formulation of a complete such framework. We propose here to bring to full development the theory of total probability for random sets or "belief functions", as arguably one of the most powerful imprecise-probabilistic theories, in order to make such methodologies viable tools for practitioners of all fields of science, with potentially vast repercussions in all the outlined partial-data scenarios.
In machine learning and computer vision, more specifically, learning and estimation are normally based on manually collected and labelled training sets that are typically very small compared to the true extent of the problem. We propose to apply the total belief framework to the example-based pose estimation problem, an extremely active vision topic with growing commercial and societal repercussions.
Research team:Dr Fabio Cuzzolin, Reader
EPSRC Project Grant
This project aims at developing an emerging paradigm shift in the field of machine learning that goes under the name of "manifold learning".
In many domains it has long been observed that data typically reside in a much lower dimensional manifold. Researchers, though, are becoming increasingly aware of the limitations of traditional “adaptive” spectral embeddings, based on computing an affinity matrix from a graph constructing from a training set of unlabeled data-points. Most real world problems involve a huge number of such data points, which typically live in extremely high-dimensional spaces. In such conditions spectral embeddings are just not computationally feasible.
A radical new, counterintuitive way of understand the structure of the data in our possession consists on coding it in a random way, against any common sense and the traditional way of thinking the problem. Such linear, sparsity-oriented, metric-preserving “non-adaptive” random projections to a much lower-dimensional “measurement” space have the potential of tackling large scale manifold learning, greatly widening its scope and making pervasive computing scenarios possible.
However, adaptive embeddings possess highly desirable properties a large scale manifold learning framework should retain. The admittedly ambitious goal of this project is to exploit the rising tide towards compressed sensing to design a novel paradigm for large scale manifold learning which integrate adaptive and non adaptive methods in a coherent scheme, in which new classification and clustering techniques are developed in the measurement space and deployed to solve problems simply not accessible before.
A new paradigm for large scale manifold learning short proposal PDF
Research teamDr Fabio Cuzzolin, Reader
Project partners:Universitat Pompeu Fabra (A. Frangi)INRIA Rhone-Alpes (R. Horaud)Technion (M. Bronstein, R. Kimmel)
European Union - Framework Programme 7FET Open - Future Emerging Technologies
NUTS: A Network on Uncertainty TheorieS
Decision making and estimation are central problems in most applied sciences, as people or machines need to make inferences about the state of the external world, and take consequent actions.
Traditionally, the state of the world is described by a probability distribution over a set of alternative hypotheses. In many cases, however, such as extremely rare events (e.g., a volcanic eruption), few statistics are available to drive the estimation. Part of the observed data is often missing. Besides, under the law of large numbers, probability distributions are the outcome of an infinite process of evidence accumulation, while in all practical cases the available evidence only provides some constraints on the ``true" probability governing the process.
As a result, an extensive battery of uncertainty theories has therefore been developed in the last half century or so, starting from De Finetti's pioneering work on subjective probability, and including the theory of evidence, possibility theory, the theory of random sets, info-gap theory, and others. Also referred to as ``imprecise probabilities" (as most of them comprise classical probabilities as a special case), they form a hierarchy of encapsulated formalisms with a potentially enormous methodological impact.
Their application is growing in all fields of applied science: engineering, artificial intelligence, forensic science, semantic web, sensor fusion, target tracking, computer vision and image processing, climate change, risk analysis, and many others. The goal of the present proposal is to bring together all the UK researchers active in this field, in order to reach the critical mass necessary to provide further impulse to its development at both national and international level, and promote its adoption among practitioners.
NUTS: A Network on Uncertainty TheorieS PDF
Research team:Dr Fabio Cuzzolin, Reader