Wednesday, October 5, 2011

Human-Computer Interface Cool Devices - Computer Vision, Projection, Gesture, Speech Recognition and Text-To-Speech - Pranav Mistry, IPhone4 and mine

Impressive, I need something like the Pranav's devices for myself.





Research Assistant

That reminds me of a project of mine called. Well, a secret. :P It was/is supposed to integrate a lot of tools in an intelligent way and to boost your performance, saving you all kinds of labor intensive tasks, and it was supposed to monitor your actions and behavior.

I have thought of developing a part of it as MS thesis in early 2008, such as marking of important paragraphs and pages from a book/paper with a simple gesture while you're reading (draw a line with your finger on the side of the paper), then storing the selected parts as images on the computer. OCR would assist in faster classification and for search, but it was supposed to be not robust and was not critical, even without it that's very useful for collecting excerpts/cites from philosophy books, newspapers and magazines, promotion booklets etc., in order to perform "batch processing later" without switching your attention. It saves time and distractions.

The best would be if you could have a head-mounted camera, but more realistically for the experiment seemed a fixed camera, where the book/paper would be set on the table, the video - recorded on a camera at 640x480x15 or 30 fps, and then processed off-line.

One of the technical issues was the resolution - I got 640x480x30 of crisp video recording on a mobile camera at the time, it had the optical resolution to capture entire regular book pages, but at this resolution the camera has to be close to the paper and properly oriented, also it's not practical to record all of the frames.
(Sure, there's one simple "round-about" - just picture the paragraphs manually, with your finger on the proper line. :X)

It seemed "obviously implementable", these gestures would be quite simple computer vision task, however I didn't started an implementation of the project. My supervisor suggested me it would be too much of an "engineering" project, while for a thesis it'd be better to be more "scientific" and propose some contributions, not that most of my colleagues were very far from being scientific.

I eventually ended up with a thesis on my microformant hybrid Bulgarian text-to-speech synthesizer, I developed the essential part of which for ~ 5 weeks as a first year student. Of course, I added additional projected and proposed functions and ideas for improvements and for a totally different design which is supposed to learn to speak like a baby, but I didn't have the time to implement the improvements - was too busy with my job then.




Text-to-Speech

Indeed Text-to-Speech, without speech-recognition, is useful way to save some time of being stuck on the chair in front of the computer, by reading your texts aloud, I use it myself.

The "baby synthesizer" of mine is not a junked project either, but the best way to implement it is with a robust general AGI system which is to be developed.

No comments :