Tuesday, May 10, 2016

On Computer vision Visualisation and the limitations of human imagination and vision displayed in these renderings, and how it should be in a better thinking machine's vision visualisation - "The Terminator" film franchise/series as an example

Comment by Todor Arnaudov, inspired by/on: http://www.hudsandguis.com/home/2011/01/02/terminator-the-evolution-of-machine-vision

I'm a fan of the Terminator and liked the article, but I'd challenge the statement in the introduction that the displays in the movie are:
"how the robots would potentially see the world"

I think it's rather how HUMANS imagine that they themselves would see the world with added "computerese", " coders' ", " programmers' " decorations on the screen/view, like pilot's screen, computer simulations screen etc.

It's a style to show "computer stuff" to no-computer literate/experts, especially in T1 and T2.*

We are limited with one "screen", the final one, the integration of the two retinas, but machines are not.

In my opinion a more adequate representation would be a huge set of images (screens), virtually unlimited, displayinga gradual coverage of different kinds and depths of image processing, analysis and understanding. The will-part (what you want, search for, find) is in additional layers.

It cannot be perceived in-full in real time by a human, but in a future kind of interactive or multi-screen/explorative/educational/deep films it could be explored, in a space like the following:

1. Raw input, normal vision

2. Set of simple image processings (contrast, gamma, posterize/levels, ...) - different slices of the visual spectrum. The reddish view is one of it.

3. More complex - lines, blobs, contours, ...

4. Sets of templates for matching when searching. That's like the content of the executive function's top-down attention, consciousness. The templates on their own could cover a lot of screens with different representations - visual in full-colors and lighting, wireframe, gradient depth-views, different conditions, 3D-coordinates, different angles for complex and assymetric objects, similarities with others, taxonomies/classes etc., animated/rotating, zoom-in-out, different sizes.

5. 3D-reconstructed scenes, different views, distances to/between objects, sizes ... Dimensionally-reduced 2D views of the reconstructed 3D scene like in a 3D-editors: XY, XZ, YZ.

6. Connections between images - graphs, networks, sequences of flashing of different images; displaying the path of thought, search etc.

7. Paths, trajectories - past, taken, projected, predicted (correctly, wrongly); own-paths, trajectories of intended motions and transformations in the environment, sequences of planned actions, ...

8. Graphs (networks), trees, block-diagrams; blinking, different colors, different-shapes - expressing different concepts, relations, distances (ranges of distances) etc.

9. Etc.

10. The machine could see the numeric views/intentions as code, captions, indication of the will with specific symbols (instead of images), textual descriptions etc. without vision/screen, without all these text "blots" on the video that we, humans, need with our limited sensory input and processing capabilities.

For the machine they are at different layers, it could perceive all or switch attention between selected ones. There could b

However, similarly we humans do also see a lot of "numbers" without "dirtying" and overloading the "screen". We do "feel" them and express that in our actions, such as we display that we know the precise spatial coordinates of the objects and our hands by applying correct motions for grasping the objects, kicking a ball; when playing a musical instrument, improvising by ear, and applying the right sequence of motions with the right speed etc. We do "see" the numbers, but "intuitively", without a need to read them as high-level abstract symbols and translate them into motion vectors.

Etc. -- go on and synthesize more continuations and unfolds of the ideas and directions given above.

* See a follow-up article on touch-screen GUI incovenience

PS1. I'd like to note also that the code in T1 is 6502 assembly listings (Apple II, NES, ...). There's a caption about 16K memory bank switching, so it's maybe does bank-switching and had more than 64 KB RAM... :)

PS2. Regarding the choice of the sayings with the hilarious "FUC* you, as*hole" line - the author is right that it could be more dynamic, for example by displaying initial *search process*, exploration of possibilities and with a more dynamic graphics (it could be done in a remake of similar conditions, maybe).

The limited amount of options could be the final choices, the most appropriate for the situation and what the Terminator wanted.

The search is already done "subconsiously" and that's what the executive functions are provided with, in order to select an option from - the executive functions of human viewers, too.

Human persons with similar "profile" - e.g. "contract killers" or cowboys or typical "criminals" in the millenia of gangster movies - would also have a limited set of expressions for similar conditions.

And in general if you asked a multitude of people with similar personalities, experience, capabilities, knowledge etc. for the most appropriate answers in this-or-that situation, you'll get the same answers and the same lack of imagination (especially if they are asked to give *the most appropriate* ones in order to achieve certain goal, or to send a specific message).

The Terminator didn't want to engage in a conversation, he didn't want to open the door and be seen etc., so to me the lines are adequate.

PS3. I agree about the coolness of the OCR in T1, the scene where the Terminator reads the hand-written phone number. The scene with his fingers pointing the numbers in the phone-boot is silly, though; unless that's a deliberate non-functional design; just posing, intended to make the machine look more human for the potential humans around - and for the viewers of the movie, too. (From cinematographer's/director's point of view, obviously that's literaly for "pointing" the attention of the cyborg.)

PS4. This discussion is too verbose? I like/intend to give precise explanation of why I think so and so, what made me think so etc. That's the path of thought. And indeed, a comment on another article - see a follow-up article on Spaghetti code.

No comments :