Friday, December 7, 2018

Ultimate AI, Free Energy Principle and Predictive Coding vs Todor and CogAlg - Discussion in Montreal.AI forum and Artificial Mind


1. The Interview - the key to true AI by the genius neuroscientist
2. CogAlg 
and Free Energy Principle
3. Discussion at Montreal.AI and the Ultimate AI
3.1. References to Bialek and Tishby early papers on prediction in RL
Ultimate Intelligence Part III ... - an informal review and a  clash of schools of thought
4.1. Intro and acknowledgments
4.2. Criticism
4.2.1. Sigma-Product-Log-Probability mathematical formula fetishism
4.2.2. Too general
Where's the hierarchy?
4.2.4. The sum of rewards and bounded rewards are obvious

4.2.5. The hierarchy as a deadlock breaker
4.2.6. Discussion on specific quotes

5. Conclusion

1. The interview - the key to true AI

A WIRED's interview with Karl Friston was getting popular recently in social media, claiming that the "genius neuroscientist might hold the key to true AI".

Initially it seemed interesting, maybe that was something new and revolutionary, since I've been quite ignorant not knowing of him - or maybe I have forgotten a long time ago?

Well, I took a look of how the topic of "free energy principle" and "predictive coding" is defined in generic sources such as Wikipedia.

The conclusion - yes, I agree, it's the right direction, another related school of thought, but I don't agree that these ideas are so grandiose or original as presented in the press*, they were quite obvious for "my school of thought" since it started around 2001-2004, when I was a 17-19 year old kid, a rebellious teenager who haven't read or cited the contemporary literature.

Edit: the proper recent technical/neuroscientific papers seem at a different level, though, better than the general directions and not that general and lacking hierarchy. Such as this one, suggested by Eray after he read this post. I haven't studied it yet and probably would comment later about this and other related materials:

Deep temporal models and active inference

* Sure, everything in the consumer-world-celebrity-driven media is exaggerated, glamorous, the genius, extraordinary, outstanding etc., that's not an exception.

Connecting general intelligence principles with physics/Universe trends and biology is not that unheard of as well. I assume maybe it was a surprise in circles of some kind of too specialized software developers or too practical RL-ists/maths/ML/developers who didn't care about philosophy, biology, cybernetics etc.

2. CogAlg and Free Energy Principle

I asked the owner of the CogAlg project Boris about his opinion, he said that he's been hearing about that theory "at least from a decade" and in short he didn't seem impressed, because it was "nothing novel".

As of myself, I think the explicit emphasis of the idea of reducing the space of states for the living organisms and intelligence is suggestive for people who face these ideas for the first time, however it's somewhat obvious for hierarchical systems and even simple "machines", as the gears, the pistons etc. serve as "sub-spaces" which limit and guide the space of possible states.

As defined in the most ancient basics of "my theory", the higher level patterns are constructed by selected sequences/sets of elements from the lower level which serve as "instructions" (discrete), therefore not all possible combinations are covered. Only a limited space is legal, which respectively reduces the search/combinatorial space of possibilities at the higher level, therefore it has "a reduced space". That's seen in the hierarchical structures in nature: atoms, molecules, cells, tissues etc.

That "free energy principle" is yet another prove that the general direction towards AGI are getting established in different domains by different researchers.

3. Discussion at Montreal.AI and the Ultimate AI

My criticism that this was not novel in a thread on Montreal.AI facebook page ended up in a discussion with Eray Ozkural - a researcher from Friston's school of thought, a fellow AGI researcher, an author of publications at the AGI conference and knowledgeable of the Reinforcement Learning literature - particularly more than me.

His term for AGI: "Ultimate AI".

It's adding one more to the list of:  A-General-I (AGI),  Universal-AI (UAI) , Strong-AI, General-AI, Human-level AI, Goedel Machine, ... "Versatile Limitless Explorer and Self-Improver" - VLESI (one of mine, if I remember correctly... :) ) etc.

See the original discussion by Todor and Eray:
A discussion about Free energy principle vs other theories about intelligence as prediction

He directed me to the III part of a series of his papers:

Ultimate Intelligence Part III: Measures of Intelligence, Perception and Intelligent Agents 

A nice title.

He mentioned also two pioneers of the prediction paradigm in RL of which I wasn't aware, prior the "early 2000s", the period I suggested: Bialek and Tishby

Papers with promising titles that pop up: Predictability, Complexity, and Learning

The information bottleneck method:

Submitted in 2000-2001, probably coming from late 90s.


4. Ultimate Intelligence Part III ... - an informal review

4.1. Intro and acknowledgments

I reviewed Eray's paper from my perspective and share comments of mine - as a clash of my "school of thought" with his/or theirs. Mine perhaps is more philosophical.

Overall, the paper is fine and I recommend it for studying if you like those "probability-log-maths" proves like in the papers of Hutter & Legg, Solomonoff's algorithmic probability and stuff. It also has good references, both to researchers and papers, which may give you a kickstart into the subject matter. That goes also for the list of other papers by this author, they have interesting titles, I checked only a few myself, though. Good work!

However I have general criticism to that "school", not personally to the author.

My first impression and general criticism is the mathematical formula fetishism, which is present in all kinds of papers like that. Maybe it's also LaTeX-one and towards those small fonts...

Summation, Product, Log, Probability, Wave functions?(the psy at the end) thus "phases" or/and just putting Greek and Latin letters for verbal/simple things, a - action, r - reward, ... Combination of them and...

There we are, everything seems solved or proved, and it passes as academic, goes to conferences.

IMO it's tautological in general. The sense denoted with these letters is defined with natural language words and it proves itself by its definition. It claims that "this is intelligence", computes something/minimize something etc. thus "it's solved".

IMO (to me) simple formulas, while required to represent the ideas "formally", are not much more insightful than defining the formula verbally, which usually is done anyway, above and below the formulas, since it's about such general matters.

On the other hand, it's not practical or is much more confusing to define verbally more specific or complex algorithms. They are not obvious as well and require a real computation with data to see where they go. In these cases it is required to write it in code

The math formulas from these "classical" kind of algorithmic probability papers do not grow up in complexity too much and are kind of obvious in their expected outcomes, because they are stuck at one line or a few lines and I can't see concepts growing up on that.

"Where calculation begins, comprehension ceases" - Schopenhauer.

I understand that this is probably desirable by their authors, but it's not quite incrementally insightful to me.

4.2. Too General

Goes for that school of Algorithmic Probability, Hutter's model etc.

AGI should be general, but not too general, because it turns into generalities or in the deep sea of practical or theoretical uncomputability.

I'm advocate of human-like seed-AGI which develops like a child and there are milestones that it's expected to achieve developmentally.

4.3. Where is the hierarchy?

I didn't find any mention of the word "hierarchy" or "levels" in the paper, while that's crucial in building a real and scaling generally intelligent system and RL agent, as explained below as well. It is also in the heart of many prediction-based or cybernetics schools, such as:

* Ray Kurzweil (I haven't read his "How to build a brain" book, but Eray mentioned the Hierarchical HMM as his approach)
* Jeff Hawkins (hierarchical tempory memory)
* Boris Kazachenko
* The deep learning community
* Preceded by earlier cyberneticians, notably Valentin Turchin and his book "The Phenomenon of Science".
* Edit+: Neuroscience itself, of course; the early Russian and Soviet research - Pavlov etc. Anohin  discusses feedback in 1935 (санкционирующая афферентацуя, later обратная афферентация) - prior to Wiener and the Cybernetics

Is the hierarchy implied  in the paper or other ones of the author as the process of search/adjustment of the highest sum of expected rewards etc.?

However how and when exactly the levels are spawned, separated and interfaced? How the "reward"is quantified for new levels, inter-levels? How the feedback is defined?

In fact that is one of the main questions of the real AGI, which would move it out of the "generalities" territory. Boris Kazachenko is trying to do it in his Cognitive Algorithm.

4.4. Sum of rewards, or The sum of rewards and bounded rewards are obvious

  • I think that the Sum of expected rewards for a selected period ahead as a measure of "intelligent" ("rational") behavior and the need for bounded reward are already not that special thing to say.

    Yes, they have to be declared, but actually that was obvious back in the early 2000s. It seems it's known from the ancient times even from the economy and from human's greed and tendency towards more pleasure* and less displeasure.

    In the academical part of the behavioral/psychological domain, the need to take into account that each single reward is or should be bounded for generally intelligent human agents is known empirically by Simon's satisficing and from the experiment with the rat that presses the lever to stimulate his "pleasure center".

    It's known also in the everyday life by anyone, from observing the behavior in case of addiction, either in mild cases when one gets preoccupied with an activity, or in the severe cases of drug addictions.

    The scientific part in the RL is that it writes explicit formulas and uses mathematical terms like "local minimum/maximum" or "endless cycle" - since when the reward of a particular action is too big, the agent is locked in an endless cycle or a local maximum/minimum.

    However the phenomenon itself is obvious from the every-day experience. I am missing "grounding" and justifications out of the abstract formulas. Just formulas and optimization of some magnitude is tautology. I have similar criticism to CogAlg as well, even though it claims it has its justifications.

    The need for a bounded reward is obvious even theoretically, because a local maximum reward or a "cycle" of actions with maximum local reward would and could limit catastrophically the range of input in which the agent searches, thus would limit the space of environments if it starts from scratch, therefore it would be "less general" and will get into too much "exploitation over exploration".

    The bounded reward could be justified empirically both by the cases of addiction, as mentioned above, where an out of control magnitude of a "reward" (behavior drive) makes the victim a slave of too "narrow range" of repetitive goals; and also by the relations in a human society.  In general, locally, the extreme reward for one agent at the expense of the pain of many others is suppressed, except for the "elite", down to the masses.

    Top-down relations and properties are different than bottom-up and at the same level, they are not symmetrical, but that's different topic.

    Also no one can be "endlessly satisfied", there's a limit. One person, one "element", the mouth needs a little to stretch into a smile :), it couldn't stretch 10 times more.
     I know that the "maths guys" would laugh at these justifications, but presenting something obvious in simple formulas doesn't make it more meaningful, while inducing "formulas" from experience ("operator induction", or pattern discovery, predicting, modelling the input; conversion of representations between domains etc.) is what intelligence does.
    The "money" or other resources could go with less of a limit or seemingly "endlessly", but they are abstract, the money are not mapped directly to the agent, rather are part of more complex systems in which the specific human agents are constituent parts. Such systems could be called "The Corporation", "The Capitalism", "The Economy" etc., but after a limit of "happiness" more money do not increase the general reward for the individual agents. For healthy and functioning human beings,  "happyness" is "computed" based on many more parameters, not just one, especially "the amount of money owned".

    Indeed, IMO just the sum of (any) rewards is not "intelligent" (abstract, universal) per se, unless it's just self-defined like that, that this reward is intelligence, for some abstract reinforcement learning agent.

    Prediction serves as a general definition and I agree with it, however it is also not enough if given alone, because like with the addiction, it could be cheated if defined too simply or if the agent goes into a space with locally-specific features allowing it to predict too easy.

    That's why needs to include as a measure a *widening* of the range and the horizon of prediction and the formation of a generalization hierarchy.

    It needs to be more complex.

    4.5. The Hierarchy as a deadlock breaker

    In order to avoid deadlocks of falling in a maximum/minimum hole, the hierarchical system should constantly project and act in varying time-slices and with varying reward models. A unified model would be an aggregation of those switching sub-models. See articles and slides from my works.

    That implies that for a complex, hierarchical agent, *there is not one absolute best reward path*  and a measurement of intelligence, based just on the reward at the moment is right only in that window of comparison and the selected measures. It's not "objective", it's "best" for that specific selected model of the world and model of the rewards with specific limitations and compared to specific other trajectories, but in general and complex environments and multiple possible goals, there is a multitude of actions that have similar "rewards" or ones which keep the agent "alive" at a macro level. They are all "correct" and "intelligent". Thus intelligence needs to be more specifically defined with more parameters than just one "reward".

    I don't like the definition of Hutter's quoted one: "the wide range of environments". If I'm not mistaken Ben Goertzel had something similar in the 2000s. IMO this is mundane, especially together with simple formulas.

    Mapping it just to simple formulas of probabilities (varius kind ~ various pdf...) as a solution doesn't make it more clear. All kinds of papers like that look like Bayes or almost the same + - logP P(a,b) ... They are reminding of the basics of Shannon's Information Theory, which maybe has been one of my own inspirations for realizing that prediction and compression of information are the "keys to true intelligence".

    A general flaw of that school is that these formulas are too general, indiscriminate, too universal, or as coined in this paper: "ultimate". It implies that they are also inefficient to calculate.

    5. Notes on specific citations

    "An adaptive system that tends to minimize average surprise (entropy) will tend to survive longer."

    That said seems probably true, but only for a non-evolving system. Live as a whole "survives longer" by gradually adapting, trying new things and testing them for fitness, "evolving". At the moment of spawning new organisms when there's sexual reproduction, the exact combination of genes is unknown to the mother and father system, this is a big "surprise".

    6. Conclusion

    This article is underdeveloped, but that's it for now.

    See also:

    * The course program of the world first University course in AGI (see the links in the blog)
    * Todor's Theory of Mind and Universe - his philosophy and principles, expressed in works from his teenager years
    * Materials from the University course in Bulgarian and English:
    * Анализ на смисъла на изречение и ...  March 2004, @ bgit
    * Translated in English:

    Analysis of the meaning of a sentence, based on the knowledge base of an operational thinking machine. Reflections about the meaning and artificial intelligence

  • Part 1: Semantic analysis of a sentence. Reflections about the meaning of the meaning and the Artificial Intelligence

  • Part 2: Causes and reasons for human actions. Searching for causes. Whether higher or lower levels control. Control Units. Reinforcement learning.

  • Part 3: Motivation is dependent on local and specific stimuli, not general ones. Pleasure and displeasure as goal-state indicators. Reinforcement learning.

  • Part 4 : Intelligence: search for the biggest cumulative reward for a given period ahead, based on given model of the rewards. Reinforcement learning.

  • Many other articles from this Research blog - search them if you care, AGI digest, AGI email list, dscussion on the Cognitive Algorithm site etc.