Micro Computer Vision: June 2011

MicroCV was at the 24th edition of the International conference on computer vision and pattern recognition (CVPR 2011), held in the pretty town of Colorado Springs. It reminded me a little of my hometown, Mysore (except the mountains are a lot taller here). I was hoping to report daily from the conference, but the spotty internet access put paid to that thought. So we will have to do with a quick summary.

First the workshops. The choice was spoiling, but mostly I attended the Embedded computer vision workshop (ECV 2011) on Monday since that seemed most relevant to MicroCV. ECV 2011 was opened with an interesting keynote by Mike Piacentino from SRI. Clement Farabet and Yan LeCun's work on a real time configurable processor was very impressive, and included a real-time demo on 10W device. From the sampling of the presentations and the posters, its clear the focus of the ECV community is either dedicated hardware (ASICs) or efficient software implemented on GPUs, ARM Cortexes (Cortices?) and DSPs (TI dominates here).

Another keynote I really liked was Michael Black's at the human activity workshop (HAU3D). One of his main ideas was that two threads of science from the 1800s, Hitzig's work on electric current in the brain and Muybridge's work on human motion, could be now connected by exploiting both modern neuroscience methods and motion capture techniques. Yet another keynote was by Hendrik P. A. Lensch who impressed us all with a gallery of beautiful results from his work on computational illumination. The news was that he was building a hyperspectral dome to further the work on fluorescence effects.

Some broad trends: The Kinect hit CVPR like a comet from outer space. Application papers were everywhere. A paper from the MSR group that created the technology won best paper award. Projector-Camera sessions (PROCAMS 2011) had a session on Kinect ideas. Lots of demos used the Kinect. I think there will be waves of such papers in future CVPRs. Another trend was using stereo as low frequency depth and enhancing it with photometric methods. Although this is an old idea, there was a significant number of papers using it to do fine scale reconstruction.

My limited and incomplete list of interesting papers (sorry if I missed yours, I couldn't attend every session): Glare Encoding of High Dynamic Range from Wolfgang Hedrich's group had a cool idea about recovering the caustics inside light sources. Ivo Ihrke had a paper on Kaleidoscope imaging which was exciting since it had single images with many, many, many views. From 3D Scene Geometry to Human Workspace went further with Alyosha's and Abhinav's idea to exploit block structures to infer higher, human-level abstractions from images. A theory of multi-perspective defocusing had a neat idea of looking at the defocus in cross-slit cameras. Gupta at al. solved an open problem in structured light reconstruction under global illumination while Agrawal at al. did the same for a problem in catadioptric imaging. Michal Irani's group had two papers on mining a video and an image for super-resolution. Tian and Narasimhan had a scale-space theory method for recovering document text from images.

It all started a couple of years ago when some hackers started putting smartphones on augment-able Roombas (the iRobot Create). I guess they really just wanted a powerful processor that was light enough for the robot to carry around. And then they started using the phone's camera and GPS and wireless...

Meanwhile Google made Goggles. In the eternal war of Telephone vs. Camera, Goggles captures the "Object Recognition" flag for the Telephone. Images are sent to the "cloud" where (relatively) simple algorithms provide amazing results by spending huge amounts of both data and processing power.

Cloud robotics was born when Goggles started to be used as a vision system for robots that had smartphones on them. Here is a semi-recent article that explains Google's strategy and has links to Google's cellbot as well as other cloud robotics projects.

So where does that leave MicroCV? Could we outsource all vision tasks to the cloud?

It seems obvious to me that certain, real-time control tasks would always need to be done on-board. This is especially true if the device is robotic in nature. Otherwise, non-critical or static-platform based tasks could be achieved through the cloud.

Additionally, there is the issue of power. Even a static device would need to consume energy to transmit heavy doses of information over the network. How painful would the energy cost be for reasonable bit rates? I'm not sure, but my gut feeling is that sending out the whole pipeline to the cloud would be infeasible. Alternatively, on-board vision systems could pre-process the image data. Such systems would complement a cloud robotics framework, since they could minimize the power spent on data transmission.

Either way, cloud robotics seems set to affect vision at the smallest scales.

Micro Computer Vision

Monday, June 20, 2011

MicroCV at CVPR 2011: afterthoughts

Friday, June 17, 2011

Cloud robotics and MicroCV