Wednesday, April 5, 2023

// // Leave a Comment

Memory of the Visionary Research Directions from 2007's second blog post and a comment on the visual transformers and their representation

Looking back to the second post in this blog (after the first which was a placeholder)... 

 // // Leave a Comment

Research Directions

Target research directions so far:

Research Directions
    • Artificial General Intelligence
    • Artificial Mind
    • Artificial Life
    • Cognitive Computing
    • Cognitive Science
    • Computational Linguistics
    • Data Mining
    • Computer Vision
    • Image Processing
    • Sound Processing

Main direction:

Understanding the processes of learning, thinking, imagination, problem solving, decision making and development of evolving, thinking and creative machines.

Sub directions:

  • Perceptions, mind states, thoughts, memories, imagination, desires, intentions etc. representation, simulation and generation.
  • Natural language understanding.
  • Natural language generation.
  • World-knowledge representation, world-physics and human behaviour simulation for NLU, NLG and for perceptions, thoughts etc. simulation.
  • Machine imagination and creative machines. Creative writing by machines. Dreaming machines.
  • Machine learning, based on world-knowledge representations and simulations evolved from the input.
  • Building world-knowledge and language competences by semi-supervised machine learning, using the web as world-knowledge feeder and language teacher.
  • Differential intelligence researches.
  • Didactics methods for measuring general intelligence of machines.
  • First lanuage acquisition by humans. Modeling language skills development. [lanuage = language]
  • First language acquisition by machines, which learn their knowledge, "corpora" and grammars like children do - by reading, analyzing and building new knowledge step by step with optional support of supervizing knowledge, given by human "teachers" or taken by the machine from ordinary textbooks and interaction with people on the Internet.
  • Conversation agents. "Chat bots", "Virtual bloggers" and "Virtual forumers" which do NLU, "imagine" what the conversation is about, have intentions and express thoughts about the topics, aiming to keep real conversation.
  • Intelligent Desktop and Network Search Engines, Intelligent Personal Organizers, Document and Notes Classifiers and Virtual Assistants.
Other directions:

Sound Processing:
  • Speech Modeling
  • Speech Synthesis
  • Synthesis of Singing
  • Speech Mimicry (extracting voice features from an input speech, then application in speech model and synthesis of speech with the same voice as the voice of the example).
Image Processing:
  • Advanced preprocessed image formats, assisting computer vision.
  • Memory and heuristics based generation of photo realistic images, without complete 3D-modeling and rendering.
  • Memory and heuristics based generation of 3D-models from single or multiple images.
  • Computer Vision - Image/object recognition, categorization, generation, combination. Bots and robots, moving in virtual 3D worlds, a real world or in hybrid 2D-3D world simulations like in Quest games, which percept the world by vision systems.

Regarding the Image processing, lately I've been playing with Bing image creator, DALL-E. I'll show pictures from my plays with it later, a comment of mine in an AGI chat two days ago:

Todor: (...) Also, "meaningfully selective" is questionable, in some POV transformers are  amazingly meaningfully selective, much better than humans in text-to-image, or "concept to image".

The generative models are better than humans in analysis and synthesis, especially with images, human synthesis capabilities with images for most humans are almost lacking at all, while DALL-E and MidJourney produce amazing and aesthetically pleasing photorealistic images which apparently are rendered by a process which is isomorphic to a classic rendering system that has an implicit 3D models, with a designer who places them in reasonable composition, with proper materials, lights and ray tracing or global illumination. Most humans struggle to draw even stick figures or in handwriting, how good and robust are their features, they are incapable to reconstruct the output. Average humans are capable only in superficial recognition and in evaluation of photorealism, if they have whole images in front of their eyes for inspection, and also the artistically gifted and trained could recreate some of the general principles of lighting, but painstakingly slowly or/and usually with references, photos etc., aan they would hardly struggle when light is interacting with refletive and transparent elements in the scene and if they lack referenes.

I.e. these transformers are far superior in that aspect of analysis and then synthesis of the "causes and effects" in the world of their inputs than even super talented humans, they have better articulated and mapped internal models of the visual physics of the light and the objects.

As of directly adjusting parameters of objects like texture-pixel-by-pixel or the tiniest 3D-model detail: humans also need explicit 3D-models and 3D-editors for that, such as Blender or 3D-Studio Max, which besides the structures of the brain have explicit meshes, triangles, materials etc. defined, and they adjust these details iteratively in a slow process. Given so  general representations and directions, the generative models are excellent.

Also, I remember an old note of mine: there could be different approaches to AGI, but at their latest stages and levels they are supposed to get more and more isomorphic and to converge, because they are supposed to work with and represent similar cognitive structures, and they start with similar ones.

0 коментара: