Micro Computer Vision

Tuesday, July 19, 2011

Mechanical RAM

If you've ever programmed embedded systems, then you know that unless your data is in program or flash memory, then its all gone when the power cycles. Now if you wanted to keep that data around, it would cost energy (since you'd have to keep things powered). The Economist (yes I need new sources, suggestions?) has a neat article about the use of mechanically-based switches to remember data. The article explains two devices, one of which is actually a MEMS switch (literally a piece of metal that breaks a circuit). Although George Boole was not famous while he lived, the first application of his work was in telephone relay switches. Maybe we are coming around, full circle.

Thursday, July 7, 2011

Wired contact lenses

The Economist had an article a while back about contact lenses with micro, flexible electronics on them. These aren't visual sensors, but, instead, are placed on the eye surface and can monitor diseases and control drug dosages. Its just another example of how curved, flexible and small electronics enable interesting applications. But if you've read your science fiction then you might suspect that its only a small step before these devices get optical elements. These could enhance vision in wavelengths beyond our visual range, by adding sensors with special filters. Or they could manipulate and augment the incoming visual field using LCDs or LCoS or DMDs. The advantage of having it on the eye would be that the tough brain-electronics interface problem would be avoided. Instead of that interface (which is still current research) the electronics would communicate to the brain by manipulating light, before it falls on the retina.

Monday, June 20, 2011

MicroCV at CVPR 2011: afterthoughts

MicroCV was at the 24th edition of the International conference on computer vision and pattern recognition (CVPR 2011), held in the pretty town of Colorado Springs. It reminded me a little of my hometown, Mysore (except the mountains are a lot taller here). I was hoping to report daily from the conference, but the spotty internet access put paid to that thought. So we will have to do with a quick summary.

First the workshops. The choice was spoiling, but mostly I attended the Embedded computer vision workshop (ECV 2011) on Monday since that seemed most relevant to MicroCV. ECV 2011 was opened with an interesting keynote by Mike Piacentino from SRI. Clement Farabet and Yan LeCun's work on a real time configurable processor was very impressive, and included a real-time demo on 10W device. From the sampling of the presentations and the posters, its clear the focus of the ECV community is either dedicated hardware (ASICs) or efficient software implemented on GPUs, ARM Cortexes (Cortices?) and DSPs (TI dominates here).

Another keynote I really liked was Michael Black's at the human activity workshop (HAU3D). One of his main ideas was that two threads of science from the 1800s, Hitzig's work on electric current in the brain and Muybridge's work on human motion, could be now connected by exploiting both modern neuroscience methods and motion capture techniques. Yet another keynote was by Hendrik P. A. Lensch who impressed us all with a gallery of beautiful results from his work on computational illumination. The news was that he was building a hyperspectral dome to further the work on fluorescence effects.

Some broad trends: The Kinect hit CVPR like a comet from outer space. Application papers were everywhere. A paper from the MSR group that created the technology won best paper award. Projector-Camera sessions (PROCAMS 2011) had a session on Kinect ideas. Lots of demos used the Kinect. I think there will be waves of such papers in future CVPRs. Another trend was using stereo as low frequency depth and enhancing it with photometric methods. Although this is an old idea, there was a significant number of papers using it to do fine scale reconstruction.

My limited and incomplete list of interesting papers (sorry if I missed yours, I couldn't attend every session): Glare Encoding of High Dynamic Range from Wolfgang Hedrich's group had a cool idea about recovering the caustics inside light sources. Ivo Ihrke had a paper on Kaleidoscope imaging which was exciting since it had single images with many, many, many views. From 3D Scene Geometry to Human Workspace went further with Alyosha's and Abhinav's idea to exploit block structures to infer higher, human-level abstractions from images. A theory of multi-perspective defocusing had a neat idea of looking at the defocus in cross-slit cameras. Gupta at al. solved an open problem in structured light reconstruction under global illumination while Agrawal at al. did the same for a problem in catadioptric imaging. Michal Irani's group had two papers on mining a video and an image for super-resolution. Tian and Narasimhan had a scale-space theory method for recovering document text from images.

Friday, June 17, 2011

Cloud robotics and MicroCV

It all started a couple of years ago when some hackers started putting smartphones on augment-able Roombas (the iRobot Create). I guess they really just wanted a powerful processor that was light enough for the robot to carry around. And then they started using the phone's camera and GPS and wireless...

Meanwhile Google made Goggles. In the eternal war of Telephone vs. Camera, Goggles captures the "Object Recognition" flag for the Telephone. Images are sent to the "cloud" where (relatively) simple algorithms provide amazing results by spending huge amounts of both data and processing power.

Cloud robotics was born when Goggles started to be used as a vision system for robots that had smartphones on them. Here is a semi-recent article that explains Google's strategy and has links to Google's cellbot as well as other cloud robotics projects.

So where does that leave MicroCV? Could we outsource all vision tasks to the cloud?

It seems obvious to me that certain, real-time control tasks would always need to be done on-board. This is especially true if the device is robotic in nature. Otherwise, non-critical or static-platform based tasks could be achieved through the cloud.

Additionally, there is the issue of power. Even a static device would need to consume energy to transmit heavy doses of information over the network. How painful would the energy cost be for reasonable bit rates? I'm not sure, but my gut feeling is that sending out the whole pipeline to the cloud would be infeasible. Alternatively, on-board vision systems could pre-process the image data. Such systems would complement a cloud robotics framework, since they could minimize the power spent on data transmission.

Either way, cloud robotics seems set to affect vision at the smallest scales.

Sunday, May 22, 2011

Misc. May: Intel Trigate, Microscopes and GE printing

Its been such a busy May for me since it seems like the news from the micro tech world keeps coming at a faster and faster pace. There are tons of things I'd like to discuss in detail, but instead I'm just listing it here:

a) The big news this month, of course, is that Intel is building "3D" gates that have a vertical fin to allow closer packing. PCmag has a good overview of the technology and the Economist explains the corporate drama surrounding the Intel-ARM war to own the future of micro device processors and the implications of the Trigate technology in that battle. The bottom line is that Trigate will allow low power devices that will extend Moore's law further into the decade. (Not for too long: just 2 years. That gives you an idea of how tough it will be to make hardware for micro machines in the future).

b) RK sent me a link about IBM fellows and Nobel prize winners discussing their scanning tunneling microscope. Now I don't know much about quantum effects, but I always thought they were esoteric theories that were used in nuclear power plants and so on. But the STM is essentially a camera that uses quantum theory to take a picture. Its fascinating stuff, and I decided to look at the history of microscopes in wikipedia, which has a good summary. You should really spend an afternoon and learn about microscopes (as I did), but here are two interesting facts to pique your interest and get you started:

The use of optical microscopes took off after a heuristic was discovered called the Abbe Sine Law
Electron microscopes are quite old. The transmission electron microscope was invented in the 30s and the scanning electron microscope in the 60s.

c) The BBC has a cool video about Near Field Communication technology that is converting our phones into e-wallets (not just credit cards, but ID cards and driving licenses too).

d) Here is a useful image sensor blog link by the founder of Advasense. Its a good place to go and search for topics on image sensor hardware, say back-illuminated sensors or whatever it is you are currently thinking about.

e) GE is going to be the first big manufacturing company that will produce products (ultrasound devices) by printing them. (Article from the Economist).

Saturday, May 14, 2011

New panoramic tech plus a note on 3D printing

Its amazing how web-based panoramic image browsing has become such an accepted and normal part of our life: think how accessible Google Streetview, Microsoft's Photosynth and Gigapan are. And this technology hasn't finished changing our world yet: its moving into two new territories.

The first is "indoor streetview" that Google is starting up for businesses. Thanks to RK for bringing this to my attention. Soon you'll be able to invite Google over and get a service setup so that potential customers can browse through images of your small business. Since Google Streetview pushed Google into automated cars (see this link), maybe iRobot needs to start worrying about competition from Google indoor robots? Why stop at businesses, when people already use web-cams to monitor small children from work. Looks like a possible direction for small, portable, low-power visual sensors.

The second direction that panaromic technology is moving toward is stereoscopy. You can find projects to create stereoscopic gigapans and TONS of stereo panoramas on the web. However, these don't actually "work" when you zoom in to see details. Its non-trivial to model the geometry of the scene and re-render 3D images with the right disparity as you closely look at small objects in the scene. MC pointed out that the "right" way to do this would be to capture the light-field at the viewing location. To do this, you would need to smoothly move a video camera in an arc, instead of a stereo pair of still cameras.

Side note on 3D printing: AJ has been working on 3D printer kits that can be assembled for classrooms. Related tools have inspired new kinds of art, as the Nytimes explains.

How long before we have printers that can print printers? The singularity approacheth....

Friday, April 29, 2011

Kinect

Yesterday I took the train down to Brown see a set of cool vision talks. One of the presenters talked about "RGB-D" images, which mean different things to different people. To me, RGB-D is an image abstraction. It augments the red, green and blue color channels with a fourth channel which is the pixel depth (the "D" in RGB-D).

Although there are tons and tons and tons of methods that reconstruct scene depths from images, RGB-D folks are agnostic to these. They have an interesting abstraction subtext to their work, which could be summarized as: "Pick a vision reconstruction method that works really well. Lets have it implemented as a hardware black box . Lets now celebrate the fact that we don't have to worry about depth-reconstruction/shape-from-X again. We'll assume that we have perfect depth, and lets build some cool vision on top of that."

I think this attitude is awesome since it gets people to think beyond scene reconstruction.

One of the "black boxes" that give RGB-D images is the Microsoft Kinect. Whats interesting for micro computer vision is an interesting sub-component of the Kinect: a tiny IR projector.

Correction: The Kinect has no IR projector. See updates below.

You can see pics of it here. The Kinect has a stereo-pair-with-projector system. The projector adds texture for the stereo system, but its "projected light" is unnoticed since its in IR and so invisible (You can see the projected pattern here, where a video was taken with a night-vision camera.)

I believe the projector pattern is fixed. This means Microsoft could have gotten away with a very bright IR LED and a physical, printed texture pattern. Why did they use a projector? I'm not sure, but there is an opportunity to hack and control the projector. I'm surprised to have not found anything along those lines yet on the web. I'm particularly curious about the projector's frame rate, and whether high-speed applications are possible.

Update: We have confirmation that the Kinect does not have a projector at all. Thanks to AV for the update. (Also, people actually read the blog.)

Update 2: Thanks to GD for pointing out this website kinecthacks.net