Tuesday, December 25, 2018

// // Leave a Comment

Developmental Approach to Machine Learning? - article by L.Smith and L.Slone - Agreed


Yes, agreed. A good read, suggesting developmental machine learning, spatio-temporally continuous input data etc.:

See the concept of “shape bias" from Developmental psychology. That's related to  discussions in the "AGI Digest" on recognition of "buildings, chairs, caricatures" ... and other articles from this research blog, regarding 3D-reconstruction at varying resolution/detail as one of the crucial operations in vision published in this blog and the general developmental direction which is driven from one of the very fist articles here about the "Educational test".



HYPOTHESIS AND THEORY ARTICLE

Front. Psychol., 05 December 2017 | https://doi.org/10.3389/fpsyg.2017.02124

A Developmental Approach to Machine Learning?

  • Department of Psychological and Brain Sciences, Indiana University Bloomington, Bloomington, IN, United States


  • See also:

Read More

Tuesday, December 18, 2018

// // Leave a Comment

Human-centered AI by Stanford University - 8 years after Todor's Interdisciplinary Course in AGI in Plovdiv 2010

See: https://hai.stanford.edu/

Introducing the initiative - Oct 19, 2018:

"But guiding the future of AI requires expertise far beyond engineering. In fact, the development of Human-Centered AI will draw on nearly every intellectual domain"

The world first interdisciplinary course in AGI in Plovdiv University started in April 2010 and was proposed as an idea to my Alma Mater in December 2009.

Among the core messages of the course were the importance of the interdisciplinarity/multidisciplinarity and the suggested leadership in the research by such persons. I've been a proponent of that approach in my writings and discussions since my teenage years, being a "Renaissance person" myself.

See also the interview with me, published in December 2009 in the popular science magazine "Obekty"* after I have given a lecture on the Principles of AGI to general public in Technical University, Sofia for the European "Researchers's Night" festival.

          (...)
- Where do the researchers' efforts should be focused in order to achieve Artificial General Intelligence (AGI)? 
First of all, research should be lead by interdisciplinary scientistswho are seeing the big pictureYou need to have a grasp of Cognitive Science, Neuroscience, Mathematics, Computer Science, Philosophy etc. Also, creation of an AGI is not just a scientific task, this is an enormous engineering enterprise – from the beginning you should think of the global architecture and for universal methods at low-level which would lead to accumulation of intelligence during the operation of the system. Neuroscience gives us some clues, neocortex is “the star” in this field. For example, it's known that the neurons are arranged in sort of unified modules – cortical columns. They are built by 6 layers of neurons, different layers have some specific types of neurons. All the neurons in one column are tightly connected vertically, between layers, and are processing a piece of sensory information together, as a whole. All types of sensory information – visual, auditory, touch etc. is processed by the interaction between unified modules, which are often called “the building blocks of intelligence”.  
- If you believe that it's possible for us to build an AGI, why we didn't manage to do it yet? What are the obstacles? 
I believe that the biggest obstacle today is time. There are different forecasts, 10-20-50 years to enhance and specify current theoretical models before they actually run, or before computers get fast and powerful enough. I am an optimist that we can go there in less than 10 years, at least to basic models, and I'm sure that once we understand how to make it, the available computing power would be enough. One of the big obstacles in the past maybe was the research direction – top-down instead of bottom-up, but this was inevitable due to the limited computing power. For example, Natural Language Processing is about language modeling; language is a reduced end result of so many different and complex cognitive processes. NLP is starting from the reduced end result, and is aiming to get back to the cognitive processes. However, the text, the output of language, does not contain all the information that the thought that created the text contains.
On the other hand, many Strong AI researchers now are sharing the position that a “Seed AI” should be designed, that is a system that processes the most basic sensory inputs – vision, audition etc. Seed AI is supposed to build and rebuild ever more complex internal representations, models of the world (actually, models of its perceptions, feelings and its own desires and needs). Eventually, these models should evolve to models of its own language, or models of human's natural language. Another shared principle is that intelligence is the ability to predict future perceptions, based on the experience (you have probably heard of Bayesian Inference and Hidden Markov Models), and that intelligence development is improvement of the scope and precision of its predictions.
Also, in order the effect of evolution and self-improvement to be created, and to avoid intractable combinatorial explosion, the predictions should be hierarchical. The predictions in an upper level are based on sequences of predictions (models) from the lower level. Similar structure is seen in living organisms – atoms, molecules, cellular organelles, cells, tissues, organs, systems, organism. The evolution and intelligence are testing which elements are working (predicting) correctly. Elements that appeared to work/to predict are fixed, they are kept in the genotype/memory, and are then used as building blocks of more complex models at a higher level of the hierarchy.
         (...)

* The original interview was in Bulgarian

As the colleagues at Stanford enumerate: their University was the place where the term AI was coined by McCarthy, where computer vision was pioneered (the Cart mobile robot; Hans Moravec), self-driving cars won DARPA Grand challenge in 2005, ImageNet, [Coursera], ... They are located in the heart of the Sillicon Valley, employ a zillion of the best students and researchers in CS, NLP, EE, AI, Neuroscience, WhatEver.

The Plovdiv course was created practically for free with no specific funding, just a regular symbolic honorarium for the presentation.

Note also that the course was written and presented in Bulgarian.

See also:
Saturday, February 24, 2018
MIT creates a course in AGI - eight years after Todor Arnaudov at Plovdiv University

...

The paradox is not so surprising, though, since most people and the culture are made for narrow specialists, both in Academia and everywhere. The "division of labor" etc. British and US wisdoms for higher profits in the rat race.

Thanks to prof. D.Mekerov, H.Krushkov, M.Manev who respected my versatility and especially to M.Manev who was in charge to accept the proposal of the course.


PS. There are other proponents of interdisciplinary and multidisciplinary research as well. I recall Gary Marcus from the popular AI journalists about; and of course as early as Norbert Wiener, if I'm not mistaken he explicitly suggested that. (The German philosophers such as Kant and Schopenhauer - as well...)

See my comment of a comment of Gary Marcus regarding Kurzweil's book:

Wednesday, January 23, 2013

Read More

Friday, December 7, 2018

// // Leave a Comment

Ultimate AI, Free Energy Principle and Predictive Coding vs Todor and CogAlg - Discussion in Montreal.AI forum and Artificial Mind


Contents

1. The Interview - the key to true AI by the genius neuroscientist
2. CogAlg 
and Free Energy Principle
3. Discussion at Montreal.AI and the Ultimate AI
3.1. References to Bialek and Tishby early papers on prediction in RL
4. 
Ultimate Intelligence Part III ... - an informal review and a  clash of schools of thought
4.1. Intro and acknowledgments
4.2. Criticism
4.2.1. Sigma-Product-Log-Probability mathematical formula fetishism
4.2.2. Too general
4.2.3. 
Where's the hierarchy?
4.2.4. The sum of rewards and bounded rewards are obvious

4.2.5. The hierarchy as a deadlock breaker
4.2.6. Discussion on specific quotes

5. Conclusion

1. The interview - the key to true AI

A WIRED's interview with Karl Friston was getting popular recently in social media, claiming that the "genius neuroscientist might hold the key to true AI".

Initially it seemed interesting, maybe that was something new and revolutionary, since I've been quite ignorant not knowing of him - or maybe I have forgotten a long time ago?

Well, I took a look of how the topic of "free energy principle" and "predictive coding" is defined in generic sources such as Wikipedia.

The conclusion - yes, I agree, it's the right direction, another related school of thought, but I don't agree that these ideas are so grandiose or original as presented in the press*, they were quite obvious for "my school of thought" since it started around 2001-2004, when I was a 17-19 year old kid, a rebellious teenager who haven't read or cited the contemporary literature.

Edit: the proper recent technical/neuroscientific papers seem at a different level, though, better than the general directions and not that general and lacking hierarchy. Such as this one, suggested by Eray after he read this post. I haven't studied it yet and probably would comment later about this and other related materials:

Deep temporal models and active inference

https://www.sciencedirect.com/science/article/pii/S0149763418302525?fbclid=IwAR1n1spCr7SFrSD9rEasY032FSBQnw9NbgfS-JbgctZ_SjKfOnjoqyWkwcg


* Sure, everything in the consumer-world-celebrity-driven media is exaggerated, glamorous, the genius, extraordinary, outstanding etc., that's not an exception.

Connecting general intelligence principles with physics/Universe trends and biology is not that unheard of as well. I assume maybe it was a surprise in circles of some kind of too specialized software developers or too practical RL-ists/maths/ML/developers who didn't care about philosophy, biology, cybernetics etc.


2. CogAlg and Free Energy Principle

I asked the owner of the CogAlg project Boris about his opinion, he said that he's been hearing about that theory "at least from a decade" and in short he didn't seem impressed, because it was "nothing novel".

As of myself, I think the explicit emphasis of the idea of reducing the space of states for the living organisms and intelligence is suggestive for people who face these ideas for the first time, however it's somewhat obvious for hierarchical systems and even simple "machines", as the gears, the pistons etc. serve as "sub-spaces" which limit and guide the space of possible states.

As defined in the most ancient basics of "my theory", the higher level patterns are constructed by selected sequences/sets of elements from the lower level which serve as "instructions" (discrete), therefore not all possible combinations are covered. Only a limited space is legal, which respectively reduces the search/combinatorial space of possibilities at the higher level, therefore it has "a reduced space". That's seen in the hierarchical structures in nature: atoms, molecules, cells, tissues etc.

That "free energy principle" is yet another proof that the general direction towards AGI are getting established in different domains by different researchers.

3. Discussion at Montreal.AI and the Ultimate AI

My criticism that this was not novel in a thread on Montreal.AI facebook page ended up in a discussion with Eray Ozkural - a researcher from Friston's school of thought, a fellow AGI researcher, an author of publications at the AGI conference and knowledgeable of the Reinforcement Learning literature - particularly more than me.

His term for AGI: "Ultimate AI".

It's adding one more to the list of:  A-General-I (AGI),  Universal-AI (UAI) , Strong-AI, General-AI, Human-level AI, Goedel Machine, ... "Versatile Limitless Explorer and Self-Improver" - VLESI (one of mine, if I remember correctly... :) ) etc.

See the original discussion by Todor and Eray:
A discussion about Free energy principle vs other theories about intelligence as prediction

He directed me to the III part of a series of his papers: https://arxiv.org/pdf/1709.03879.pdf

Ultimate Intelligence Part III: Measures of Intelligence, Perception and Intelligent Agents 

A nice title.

He mentioned also two pioneers of the prediction paradigm in RL of which I wasn't aware, prior the "early 2000s", the period I suggested: Bialek and Tishby

Papers with promising titles that pop up: Predictability, Complexity, and Learning
https://www.princeton.edu/~wbialek/our_papers/bnt_01a.pdf

The information bottleneck method: https://www.cs.huji.ac.il/labs/learning/Papers/allerton.pdf
https://arxiv.org/abs/physics/0004057

Submitted in 2000-2001, probably coming from late 90s.

...

4. Ultimate Intelligence Part III ... - an informal review


4.1. Intro and acknowledgments

I reviewed Eray's paper from my perspective and share comments of mine - as a clash of my "school of thought" with his/or theirs. Mine perhaps is more philosophical.

Overall, the paper is fine and I recommend it for studying if you like those "probability-log-maths" proves like in the papers of Hutter & Legg, Solomonoff's algorithmic probability and stuff. It also has good references, both to researchers and papers, which may give you a kickstart into the subject matter. That goes also for the list of other papers by this author, they have interesting titles, I checked only a few myself, though. Good work!

However I have general criticism to that "school", not personally to the author.

My first impression and general criticism is the mathematical formula fetishism, which is present in all kinds of papers like that. Maybe it's also LaTeX-one and towards those small fonts...

Summation, Product, Log, Probability, Wave functions?(the psy at the end) thus "phases" or/and just putting Greek and Latin letters for verbal/simple things, a - action, r - reward, ... Combination of them and...

There we are, everything seems solved or proved, and it passes as academic, goes to conferences.

IMO it's tautological in general. The sense denoted with these letters is defined with natural language words and it proves itself by its definition. It claims that "this is intelligence", computes something/minimize something etc. thus "it's solved".

IMO (to me) simple formulas, while required to represent the ideas "formally", are not much more insightful than defining the formula verbally, which usually is done anyway, above and below the formulas, since it's about such general matters.

On the other hand, it's not practical or is much more confusing to define verbally more specific or complex algorithms. They are not obvious as well and require a real computation with data to see where they go. In these cases it is required to write it in code

The math formulas from these "classical" kind of algorithmic probability papers do not grow up in complexity too much and are kind of obvious in their expected outcomes, because they are stuck at one line or a few lines and I can't see concepts growing up on that.

"Where calculation begins, comprehension ceases" - Schopenhauer.

I understand that this is probably desirable by their authors, but it's not quite incrementally insightful to me.

4.2. Too General

Goes for that school of Algorithmic Probability, Hutter's model etc.

AGI should be general, but not too general, because it turns into generalities or in the deep sea of practical or theoretical uncomputability.

I'm advocate of human-like seed-AGI which develops like a child and there are milestones that it's expected to achieve developmentally.

4.3. Where is the hierarchy?

I didn't find any mention of the word "hierarchy" or "levels" in the paper, while that's crucial in building a real and scaling generally intelligent system and RL agent, as explained below as well. It is also in the heart of many prediction-based or cybernetics schools, such as:

* Ray Kurzweil (I haven't read his "How to build a brain" book, but Eray mentioned the Hierarchical HMM as his approach)
* Jeff Hawkins (hierarchical tempory memory)
* Boris Kazachenko
* The deep learning community
* Preceded by earlier cyberneticians, notably Valentin Turchin and his book "The Phenomenon of Science".
* Edit+: Neuroscience itself, of course; the early Russian and Soviet research - Pavlov etc. Anohin  discusses feedback in 1935 (санкционирующая афферентацуя, later обратная афферентация) - prior to Wiener and the Cybernetics

Is the hierarchy implied  in the paper or other ones of the author as the process of search/adjustment of the highest sum of expected rewards etc.?

However how and when exactly the levels are spawned, separated and interfaced? How the "reward"is quantified for new levels, inter-levels? How the feedback is defined?

In fact that is one of the main questions of the real AGI, which would move it out of the "generalities" territory. Boris Kazachenko is trying to do it in his Cognitive Algorithm.


4.4. Sum of rewards, or The sum of rewards and bounded rewards are obvious






  • I think that the Sum of expected rewards for a selected period ahead as a measure of "intelligent" ("rational") behavior and the need for bounded reward are already not that special thing to say.

    Yes, they have to be declared, but actually that was obvious back in the early 2000s. It seems it's known from the ancient times even from the economy and from human's greed and tendency towards more pleasure* and less displeasure.

    In the academical part of the behavioral/psychological domain, the need to take into account that each single reward is or should be bounded for generally intelligent human agents is known empirically by Simon's satisficing and from the experiment with the rat that presses the lever to stimulate his "pleasure center".

    It's known also in the everyday life by anyone, from observing the behavior in case of addiction, either in mild cases when one gets preoccupied with an activity, or in the severe cases of drug addictions.

    The scientific part in the RL is that it writes explicit formulas and uses mathematical terms like "local minimum/maximum" or "endless cycle" - since when the reward of a particular action is too big, the agent is locked in an endless cycle or a local maximum/minimum.

    However the phenomenon itself is obvious from the every-day experience. I am missing "grounding" and justifications out of the abstract formulas. Just formulas and optimization of some magnitude is tautology. I have similar criticism to CogAlg as well, even though it claims it has its justifications.

    The need for a bounded reward is obvious even theoretically, because a local maximum reward or a "cycle" of actions with maximum local reward would and could limit catastrophically the range of input in which the agent searches, thus would limit the space of environments if it starts from scratch, therefore it would be "less general" and will get into too much "exploitation over exploration".

    The bounded reward could be justified empirically both by the cases of addiction, as mentioned above, where an out of control magnitude of a "reward" (behavior drive) makes the victim a slave of too "narrow range" of repetitive goals; and also by the relations in a human society.  In general, locally, the extreme reward for one agent at the expense of the pain of many others is suppressed, except for the "elite", down to the masses.

    Top-down relations and properties are different than bottom-up and at the same level, they are not symmetrical, but that's different topic.

    Also no one can be "endlessly satisfied", there's a limit. One person, one "element", the mouth needs a little to stretch into a smile :), it couldn't stretch 10 times more.
     I know that the "maths guys" would laugh at these justifications, but presenting something obvious in simple formulas doesn't make it more meaningful, while inducing "formulas" from experience ("operator induction", or pattern discovery, predicting, modelling the input; conversion of representations between domains etc.) is what intelligence does.
    The "money" or other resources could go with less of a limit or seemingly "endlessly", but they are abstract, the money are not mapped directly to the agent, rather are part of more complex systems in which the specific human agents are constituent parts. Such systems could be called "The Corporation", "The Capitalism", "The Economy" etc., but after a limit of "happiness" more money do not increase the general reward for the individual agents. For healthy and functioning human beings,  "happyness" is "computed" based on many more parameters, not just one, especially "the amount of money owned".

    Indeed, IMO just the sum of (any) rewards is not "intelligent" (abstract, universal) per se, unless it's just self-defined like that, that this reward is intelligence, for some abstract reinforcement learning agent.

    Prediction serves as a general definition and I agree with it, however it is also not enough if given alone, because like with the addiction, it could be cheated if defined too simply or if the agent goes into a space with locally-specific features allowing it to predict too easy.

    That's why needs to include as a measure a *widening* of the range and the horizon of prediction and the formation of a generalization hierarchy.

    It needs to be more complex.

    4.5. The Hierarchy as a deadlock breaker

    In order to avoid deadlocks of falling in a maximum/minimum hole, the hierarchical system should constantly project and act in varying time-slices and with varying reward models. A unified model would be an aggregation of those switching sub-models. See articles and slides from my works.

    That implies that for a complex, hierarchical agent, *there is not one absolute best reward path*  and a measurement of intelligence, based just on the reward at the moment is right only in that window of comparison and the selected measures. It's not "objective", it's "best" for that specific selected model of the world and model of the rewards with specific limitations and compared to specific other trajectories, but in general and complex environments and multiple possible goals, there is a multitude of actions that have similar "rewards" or ones which keep the agent "alive" at a macro level. They are all "correct" and "intelligent". Thus intelligence needs to be more specifically defined with more parameters than just one "reward".

    I don't like the definition of Hutter's quoted one: "the wide range of environments". If I'm not mistaken Ben Goertzel had something similar in the 2000s. IMO this is mundane, especially together with simple formulas.

    Mapping it just to simple formulas of probabilities (varius kind ~ various pdf...) as a solution doesn't make it more clear. All kinds of papers like that look like Bayes or almost the same + - logP P(a,b) ... They resemble the basics of Shannon's Information Theory, which maybe has been one of my own inspirations for realizing that prediction and compression of information are the "keys to true intelligence".

    A general flaw of that school is that these formulas are too general, indiscriminate, too universal, or as coined in this paper: "ultimate". It implies that they are also inefficient to calculate.

    5. Notes on specific citations

    "An adaptive system that tends to minimize average surprise (entropy) will tend to survive longer."

    That said seems probably true, but only for a non-evolving system. Life as a whole "survives longer" by gradually adapting, trying new things and testing them for fitness, "evolving". At the moment of spawning new organisms when there's sexual reproduction, the exact combination of genes is unknown to the mother and father system, this is a big "surprise".

    6. Conclusion


    This article is underdeveloped, but that's it for now.

    See also:

    * The course program of the world first University course in AGI (see the links in the blog)
    * Todor's Theory of Mind and Universe - his philosophy and principles, expressed in works from his teenage years
    * Materials from the University course in Bulgarian and English: http://research.twenkid.com/agi/
    * Анализ на смисъла на изречение и ...  March 2004, @ bgit
    * Translated in English:

    Analysis of the meaning of a sentence, based on the knowledge base of an operational thinking machine. Reflections about the meaning and artificial intelligence
    http://artificial-mind.blogspot.com/2010/01/semantic-analysis-of-sentence.html




  • Part 1: Semantic analysis of a sentence. Reflections about the meaning of the meaning and the Artificial Intelligence



  • Part 2: Causes and reasons for human actions. Searching for causes. Whether higher or lower levels control. Control Units. Reinforcement learning.


  • Part 3: Motivation is dependent on local and specific stimuli, not general ones. Pleasure and displeasure as goal-state indicators. Reinforcement learning.



  • Part 4 : Intelligence: search for the biggest cumulative reward for a given period ahead, based on given model of the rewards. Reinforcement learning.



  • Many other articles from this Research blog - search them if you care, AGI digest, AGI email list, dscussion on the Cognitive Algorithm site etc.


    * Note, 26.3.2023: A few corrections of found errors: Live - Life; "they are reminding ..." -- they resemble; "teenager years" - teenage years
  • Read More