Suman Saha

  • Suman Saha Suman Saha joined the Department of Computing and Communication Technologies as a research student in September 2014. The title of his PhD is ‘Online human action detection and instance segmentation in temporally untrimmed videos’.

    I started my research degree at Oxford Brookes University in September 2014 following an MSc in Computer Science at the University of Bedfordshire. The Artificial Intelligence and Vision research group at Brookes has a world class reputation and the Computer Vision and Robotics team has a close relationship with the University of Oxford. These are factors which motivated me to undertake a research degree at Brookes, along with the full-time funded University Research Scholarship.

    The machine learning and computer vision group at Brookes has the cutting edge of computing resources which facilitate students to undertake state-of-the-art (SOA) research to compete against other leading research institutions and organisations. For example, high performance cluster server, work stations loaded with extremely faster graphical processing units which are highly optimized for parallel processing. The Robotics and Cognitive research group provides SOA robotic platforms such as RoboThespian, NAO humanoid robots, Baxter and TurtleBot.

    Emerging real-world applications require an all-round approach to the machine understanding of human behaviour which goes beyond the recognition of simple, isolated actions, traditionally from whole videos. Imagine an autonomous flying surveillance UAV able to report that a person is “leaving his bag unattended” while somebody else is “trying to unlock a door but is failing to do so”; a robotic surgical assistant which understands that the surgeon is preparing to dissect a tissue strand, and reconfigures its arms to help him distend the target surface while adjusting camera focus; a smart car spotting children walking along the sidewalk near a zebra crossing, and pre-emptively adjusting its speed to cope with the possibility they suddenly decide to cross the road. The potential of such “aware agents” to improve people’s quality of life, security levels and business prospects is quite frankly enormous.

    What smart cars and robotic surgeons need to achieve is a comprehensive awareness of what takes place in a complex environment, such as a street crowded with other vehicles and pedestrians. Such understanding needs to mature incrementally and has to be put to good use in real-time. Multiple simultaneous activities need to be first located and then recognised. Unfortunately, human activities are inherently difficult to capture and sometimes very hard to categorise.

    We are developing a framework which, given an incoming video stream, is able to incrementally learn and instantaneously detect, localise (in space and time) and recognise any number of complex human activities present.

    The most exciting thing for me as a research student is the freedom to undertake independent research and focus on the development of a significant and original piece of research in my area of expertise. After completing my PhD, I hope to join as a Postdoctoral Research Fellow and continue my research in AI and Computer Vision.