'Researcher turns his baby into CCTV star in the name of science' diff viewer (1/2)

This article is from the source 'bbc' and was first published or seen on July 02, 2009 02:37 (UTC). It will not be checked again for changes.

You can find the current article at its original source at http://news.bbc.co.uk/go/rss/-/1/hi/sci/tech/8127804.stm

The article has changed 4 times. There is an RSS feed of changes available.

Previous version 1 2 3 Next version

Previous version 1 2 3 Next version

Version 1	Version 2
Big brother untangles baby babble	Big brother untangles baby babble
2009-07-02 08:02:49 UTC	2009-07-03 16:33:51 UTC (1 day later)
~~By Jonathan Fildes Science and technology reporter, BBC News ~~Advertisement~~~~	By Jonathan Fildes Science and technology reporter, BBC News
Over time Professor Roy's son learns how to say the word 'ball' (footage: MIT Media Lab)	Over time Professor Roy's son learns how to say the word 'ball' (footage: MIT Media Lab)
"Can you think of a more complicated question to ask?" says Deb Roy, as he explains the genesis of his work.	"Can you think of a more complicated question to ask?" says Deb Roy, as he explains the genesis of his work.
In 2005, the artificial intelligence researcher at the Massachusetts Institute of Technology (MIT) Media Lab set out to understand how children learn to talk.	In 2005, the artificial intelligence researcher at the Massachusetts Institute of Technology (MIT) Media Lab set out to understand how children learn to talk.
"We wanted to understand how minds work and how they develop and how the interplay of innate and environmental influence makes us who we are and how we learn to communicate."	"We wanted to understand how minds work and how they develop and how the interplay of innate and environmental influence makes us who we are and how we learn to communicate."
It was a big task and after years of research, scientists around the world had only begun to scratch the surface of it.	It was a big task and after years of research, scientists around the world had only begun to scratch the surface of it.
But now, Professor Roy is beginning to get some answers, thanks to an unconventional approach, an accommodating family and a house wired with technology.	But now, Professor Roy is beginning to get some answers, thanks to an unconventional approach, an accommodating family and a house wired with technology.
And the research may even have kick-backs for everything from robotics to video analysis.	And the research may even have kick-backs for everything from robotics to video analysis.
Snap shots	Snap shots
The question of how infants learn to speak is hotly debated. At its simplest level the argument comes down to "nature versus nurture".	The question of how infants learn to speak is hotly debated. At its simplest level the argument comes down to "nature versus nurture".
On one side, scientists argue that children have an innate hard-wired ability to learn language, while on the other side, researchers argue that language is learned through interactions with the people and environment around them.	On one side, scientists argue that children have an innate hard-wired ability to learn language, while on the other side, researchers argue that language is learned through interactions with the people and environment around them.
The first task we set for ourselves was to transcribe everything my son heard or said from nine to 24 months Deb Roy	The first task we set for ourselves was to transcribe everything my son heard or said from nine to 24 months Deb Roy
Between the two extremes is a spectrum of opinion.	Between the two extremes is a spectrum of opinion.
Professor Roy wandered into this debate as someone originally more interested in robots than children.	Professor Roy wandered into this debate as someone originally more interested in robots than children.
"I was initially inspired by how children learn language as a new way of building machines," he says.	"I was initially inspired by how children learn language as a new way of building machines," he says.
But looking through the raft of prior research on the effect of environment on language, he noticed a common problem; previous studies only offered snapshots of a child's development.	But looking through the raft of prior research on the effect of environment on language, he noticed a common problem; previous studies only offered snapshots of a child's development.
"Every parent knows that a child can change a lot in a week or a month," he told BBC News.	"Every parent knows that a child can change a lot in a week or a month," he told BBC News.
"If you're interested in the process of development then it is important to have a continuous view."	"If you're interested in the process of development then it is important to have a continuous view."
It is a problem recognised by other linguists as well.	It is a problem recognised by other linguists as well.
"Current samples that the field works with - typically an hour of recorded speech a week - are one to two orders of magnitude too small for our scientific purposes," Professor Steven Pinker of Harvard University told BBC News.	"Current samples that the field works with - typically an hour of recorded speech a week - are one to two orders of magnitude too small for our scientific purposes," Professor Steven Pinker of Harvard University told BBC News.
So, Professor Roy, who by then had a child on the way, set about solving the conundrum. His solution: wire up his house with 11 cameras, 14 microphones and terabytes of storage and record every waking moment of his soon-to-arrive son.	So, Professor Roy, who by then had a child on the way, set about solving the conundrum. His solution: wire up his house with 11 cameras, 14 microphones and terabytes of storage and record every waking moment of his soon-to-arrive son.
It was christened the Human Speechome project and immediately drew comparisons with its genetic counterpart.	It was christened the Human Speechome project and immediately drew comparisons with its genetic counterpart.
"Just as the Human Genome Project illuminates the innate genetic code that shapes us, the Speechome Project is an important first step toward creating a map of how the environment shapes human development and learning," said Frank Moss, the director of MIT's Media Lab at the time.	"Just as the Human Genome Project illuminates the innate genetic code that shapes us, the Speechome Project is an important first step toward creating a map of how the environment shapes human development and learning," said Frank Moss, the director of MIT's Media Lab at the time.
Professor Pinker, who is also an adviser to the project, said: "In developmental psychology there has long been a trade-off between gathering lots of data from a small number of children, or a small amount of data from a much larger number of children.	Professor Pinker, who is also an adviser to the project, said: "In developmental psychology there has long been a trade-off between gathering lots of data from a small number of children, or a small amount of data from a much larger number of children.
"Roy is simply pushing this trade-off to an extreme - a truly massive amount of data from a single child."	"Roy is simply pushing this trade-off to an extreme - a truly massive amount of data from a single child."
Now, a quarter of million hours of recordings later, Professor Roy is beginning to tease apart the masses of data and look for answers.	Now, a quarter of million hours of recordings later, Professor Roy is beginning to tease apart the masses of data and look for answers.
Deep dive	Deep dive
To extract meaningful patterns from the 200GB (gigabytes) of data that flowed daily onto the racks of hard drives in the basement, the team created a series of software tools.	To extract meaningful patterns from the 200GB (gigabytes) of data that flowed daily onto the racks of hard drives in the basement, the team created a series of software tools.
The first, ominously called Total Recall, allows a researcher to quickly scan through any part of the data. All 25 recordings from the microphones and cameras are shown as separate channels.	The first, ominously called Total Recall, allows a researcher to quickly scan through any part of the data. All 25 recordings from the microphones and cameras are shown as separate channels.
HUMAN SPEECHOME PROJECT 11x 1 megapixel fisheye lens cameras. Swithced on by motion sensors.14x omnidirectional microphones recording CD quality sound1000m (3000ft) wires connect recorders to servers in basementRecord from 8am -10pm every day for 3 yearsPDAs in each room can be used to control recording'Oops' button wipes last few minutes of recording	HUMAN SPEECHOME PROJECT 11x 1 megapixel fisheye lens cameras. Swithced on by motion sensors.14x omnidirectional microphones recording CD quality sound1000m (3000ft) wires connect recorders to servers in basementRecord from 8am -10pm every day for 3 yearsPDAs in each room can be used to control recording'Oops' button wipes last few minutes of recording
Sound is represented as a spectrograph, while the video is processed to show only movement, creating a ribbon of colour, which looks like the flow of traffic at night and represents the accumulated motions of life in the Roy household.	Sound is represented as a spectrograph, while the video is processed to show only movement, creating a ribbon of colour, which looks like the flow of traffic at night and represents the accumulated motions of life in the Roy household.
While useful for getting a sense of when and where action may have taken place, the team needed another set of tools to delve deeper into the data.	While useful for getting a sense of when and where action may have taken place, the team needed another set of tools to delve deeper into the data.
"The first task we set for ourselves was to transcribe everything my son heard or said from nine to 24 months," he says.	"The first task we set for ourselves was to transcribe everything my son heard or said from nine to 24 months," he says.
He estimates that there is somewhere between 10 to 12 million words of speech to transcribe.	He estimates that there is somewhere between 10 to 12 million words of speech to transcribe.
"For anyone that has transcribed speech, they will know that is a laborious and slow process," he says, with a degree of understatement.	"For anyone that has transcribed speech, they will know that is a laborious and slow process," he says, with a degree of understatement.
Initially his team tried to use off-the-shelf speech recognition software, but soon realised that they were not up to the job of extracting words from often-noisy environments.	Initially his team tried to use off-the-shelf speech recognition software, but soon realised that they were not up to the job of extracting words from often-noisy environments.
"We realised that the state of the art is not even close to good enough," he told the BBC.	"We realised that the state of the art is not even close to good enough," he told the BBC.
Automatic systems could have error rates of up to 90%, he said.	Automatic systems could have error rates of up to 90%, he said.
At the other extreme, Professor Roy also experimented with human transcribers, but that also came with its own problems.	At the other extreme, Professor Roy also experimented with human transcribers, but that also came with its own problems.
"It would take an average of 10 hours to find and transcribe one hour of speech," he told the BBC.	"It would take an average of 10 hours to find and transcribe one hour of speech," he told the BBC.
HUMAN SPEECHOME IN NUMBERS 90,000 hours of video recorded140,000 hours of audio recordingsApprox 200GB of data collected every day150 TB of raw data collected over course of project70% of infants waking hours captured10 to 12m words spoken4m words so far transcribed Speechome project launched	HUMAN SPEECHOME IN NUMBERS 90,000 hours of video recorded140,000 hours of audio recordingsApprox 200GB of data collected every day150 TB of raw data collected over course of project70% of infants waking hours captured10 to 12m words spoken4m words so far transcribed Speechome project launched
When you are trying to analyse 16 months of video from 14 microphones, those kinds of ratios don't seem attractive.	When you are trying to analyse 16 months of video from 14 microphones, those kinds of ratios don't seem attractive.
Instead, the researchers created a piece of software called Blitzscribe, which finds speech in the recordings and breaks it down into easily transcribed sound bites.	Instead, the researchers created a piece of software called Blitzscribe, which finds speech in the recordings and breaks it down into easily transcribed sound bites.
"We have automated components assisting human annotators," he said.	"We have automated components assisting human annotators," he said.
The net result is that we have reduced 10 hours down to two hours."	The net result is that we have reduced 10 hours down to two hours."
The analysis also takes into account how a word was said - called prosody - and who said it.	The analysis also takes into account how a word was said - called prosody - and who said it.
To date, the team have already transcribed more than four million words.	To date, the team have already transcribed more than four million words.
"It's already the most complete transcript of everyday life at home than any recording ever made."	"It's already the most complete transcript of everyday life at home than any recording ever made."
A similar human-computer system, called TrackMarks, has also been developed to analyse the video and gives information such as where people are in relation to one another and the orientation of their heads.	A similar human-computer system, called TrackMarks, has also been developed to analyse the video and gives information such as where people are in relation to one another and the orientation of their heads.
~~Advertisement~~
Software visualises how care givers interact with the child over time	Software visualises how care givers interact with the child over time
Although the data sets are still incomplete, Professor Roy says they are already beginning to see interesting results.	Although the data sets are still incomplete, Professor Roy says they are already beginning to see interesting results.
For example, his team has been able to begin to tease apart a process he calls "word births", the time when a baby first begins to use a word.	For example, his team has been able to begin to tease apart a process he calls "word births", the time when a baby first begins to use a word.
By analysing the length, and hence complexity, of sentences spoken by caregivers to his son, he believes that he has shown that adults subconsciously simplify sentences until the child understands the word.	By analysing the length, and hence complexity, of sentences spoken by caregivers to his son, he believes that he has shown that adults subconsciously simplify sentences until the child understands the word.
Once it has been understood, the adults then build up the complexity of the sentences containing the word.	Once it has been understood, the adults then build up the complexity of the sentences containing the word.
"We essentially meet him at this point of the birth of the word and gently pull him into language," he says.	"We essentially meet him at this point of the birth of the word and gently pull him into language," he says.
The Speechome Recorder can be fitted in any home	The Speechome Recorder can be fitted in any home
Professor Roy stresses it is an initial result and has not been validated by the scientific community. However, he says, it shows the kind of questions that can be answered with the data and tools he now has.	Professor Roy stresses it is an initial result and has not been validated by the scientific community. However, he says, it shows the kind of questions that can be answered with the data and tools he now has.
But winning over the rest of the scientific community might be his most difficult job.	But winning over the rest of the scientific community might be his most difficult job.
It remains to be seen whether other scientists will accept his conclusions as they are based on the analysis of just one child and, as Professor Roy admits, are unlikely to be reproduced because of time and cost.	It remains to be seen whether other scientists will accept his conclusions as they are based on the analysis of just one child and, as Professor Roy admits, are unlikely to be reproduced because of time and cost.
In part to address this criticism, he has developed a stand-alone device - called the Speechome recorder - that can be easily put into homes with out 1000m (3000ft) of wiring in the walls and converting the basement into a data centre.	In part to address this criticism, he has developed a stand-alone device - called the Speechome recorder - that can be easily put into homes with out 1000m (3000ft) of wiring in the walls and converting the basement into a data centre.
The devices look like floor lamps and contain an overhead microphone and camera, with another lens at eye level for children.	The devices look like floor lamps and contain an overhead microphone and camera, with another lens at eye level for children.
The base of the device holds a touch-screen display and enough storage to hold several months of recordings.	The base of the device holds a touch-screen display and enough storage to hold several months of recordings.
Their first deployment will be in six pilot studies of children with autism where they will be used to monitor and quantify the children's response to treatment.	Their first deployment will be in six pilot studies of children with autism where they will be used to monitor and quantify the children's response to treatment.
"I'm really excited - this is the future of the project," says Professor Roy.	"I'm really excited - this is the future of the project," says Professor Roy.
Robot reflex	Robot reflex
But he also has his eye on other possible spin-offs.	But he also has his eye on other possible spin-offs.
For example the video-analysis algorithms designed for the project could be used in automated systems to monitor CCTV cameras and extract information about particular events.	For example the video-analysis algorithms designed for the project could be used in automated systems to monitor CCTV cameras and extract information about particular events.
He is also working with architects to visualise how people move around an environment and how changes to building design affect that.	He is also working with architects to visualise how people move around an environment and how changes to building design affect that.
The results are being fed into creating a semi-automated architectural design system.	The results are being fed into creating a semi-automated architectural design system.
"This could be really interesting if you're designing a retail space or if you are an architect and have a design and want to know whether it will work or how to change it."	"This could be really interesting if you're designing a retail space or if you are an architect and have a design and want to know whether it will work or how to change it."
However, Professor Roy has never forgotten his roots in robotics and still hopes to bring the project full-circle.	However, Professor Roy has never forgotten his roots in robotics and still hopes to bring the project full-circle.
"What if we can build a machine that can step into the shoes of a child and learn in human-like ways," he asks.	"What if we can build a machine that can step into the shoes of a child and learn in human-like ways," he asks.
"Imagine transferring that into a video game character or into a domestic robot that can now learn to communicate and interact in social ways.	"Imagine transferring that into a video game character or into a domestic robot that can now learn to communicate and interact in social ways.
"I see a lot of pathways back."	"I see a lot of pathways back."