Friday, February 26, 2021

Consciousness Prior and Causality - matches of Bengio's 2017-2018 example and ideas with Todor's "Theory of Universe and Mind" from 2003-2004

See especially my example with the thrown coin, whose trajectory and future we predict with absolute precision at linguistic level, verb-noun, sentences, and humans believe that this proves their free will, but it is true only due to the very low bandwidth/bit-rate; compare it to Bengio's example, where he throws a little piece of paper - "if I try to predict... it is very hard ... but I could predict it is going to be on the floor". Etc. Point 6 in the 2004 work below, also section 14. in the 2003 Part 3.

My examples and definitions are broader and more philosophical, they are part of the emphasis in my Theory that the core of intelligence is prediction of the future (will, causality) and compression; about the "compositionality" (see Bengio's talk also), the ranges of Resolution of Causation and Resolution of Perception, in which mind operates, and that there are degrees, in the terminology of mine - there are virtual universes at different levels; "levels of abstractions/generalisations". The examples in my discussion are also about the notion of "free will", using the information bandwidth for showing that the "free" component of our and "conscious" will (causation power) is ridiculously low - just a few bits per second.

What B. calls "consciousness prior" is a higher level top-down direction/drive, reducing the search space (yes, "attention" as B. mentions), a high degree of compression and operation at low resolution, searching for matches and reducing the resolution of perception and causation to as low as allowing for complete match/prediction with the maximum target resolution for that virtual universe etc. These are the cognitive aspects, they do not require the transcendental ones of consciousness, qualia, subjective feeling etc.
For original sources in Bulgarian and other translations see below the comparison table.

From Deep Learning of Disentangled Representations to Higher-level Cognition

52 735 показвания
•  9.02.2018 г.
Refers to "The Consciousness Prior", Bengio 2017:

Bengio's presentaton Slide @ 38:29:

* Conscious thoughts are very low dimensional objects, compared to the full state of the (unconscious) brain.
* Yet they have unexpected predictive value or usefulness

- strong constraint or prior on the underlying representations

  * Thought: composition of few selected factors/concepts (key/value) at the highest level of abstraction  of our brain
  * Richer than but closely associated with short verbal expression such as a sentence or phrase, a rule or fact (link to classical symbolic AI & knowledge representation)

Yoshua Bengio, 53 years old, 1.2018

Turing Award, MILA leader, "AI godfather"
Talk at Microsoft Research :
"From Deep Learning of Disentangled Representations to Higher-level Cognition"

Todor Arnaudov, 18-19-years old, 2002-2004

"The Sacred Computer" AGI e-zine* (the original is in Bulgarian)

Regarding unsupervised ML models for speech, not recognizing phonemes properly, because, B. argues, their information content is very low, compared to that of the raw audio:

37:06: "How is it that these models haven't been able to discover them and then see that there's like this really powerful part of the signal, which is explained by the dependencies between phonemes?

And the reason is, I think, simply that that part of the signal occupies very few bits in the total number of bits that is in the signal, right? So, the rows in this signal is 16 thousand real numbers per second. How many phonemes per second do you get? Well, I don't know, 10, right? or maybe 16.

So there's a factor of a thousand in terms of how many bits of information are carried by the word level, phoneme level information versus the acoustic level information."

On "Consciousness Priors"

"And the prior is that, there are, the assumption about the world is that, there are many important things that can be said about
the world which can be expressed in one sentence,
which can be expressed by a low dimensional statement. Which refer to just a few variables.
And often they are discreet those
we express in language, sometimes they're not. We can like draw or somehow, use that to plan.
But they are very, very low dimensional.
And it's not obvious, if priori that things about the world could be said that are true and low dimensional.

"If I try to predict the future here, there are many aspects of it that are hard for me to predict.

Like, where is it going to land exactly?

It's very, very hard to predict,right? It's a game. But, I could predict that it's going to be on the floor. It's one bit of information. And I can predict that with very, very high certainty and a lot of what we talk about are these kinds of statements. Like, if I drop the object, it's going to end on the floor, in this context. So, this is the assumption that we're trying to encapsulate in machine learning terms with this consciousness prior idea.

"Theory of Universe and Mind - Part 4", 2004


6. The Resolution of Causality-Control (RC) describes the capability of a Causality-Control unit to output data from its memory (its universe) to the memory of the mother universe in a way that the changes in the mother universe to be closer to the smallest possible changes in the mother universe and most close to the expected ones.

The Resolution of perception (RP) shows what features from the mother universe are perceived (distinguished) by the evaluating unit, which is a subordinated universe.

When a person decides to throw a coin and executes that action, she thinks/assumes that (...) she has done "what she wanted to do".

The Resolution of Causality-Control and of the Perception in that case is described by verbs, nouns, adjectives, prepositions etc. parts of speech of the language of the beings possessing general intelligence. That language, called also "natural language", describes the way by which the human mind perceives the world and it is limited by the narrow information bandwidth, accessible to humans.

The linguistic description gives a sense of freedom to the human mind to do "whatever she wants" [namely] due to its low resolution and the low criterium of the precision of the execution of "what she wants".

For example, the Resolution of Causality-Control and Perception in the example above is verb-noun. (...)

However the resolution in the mother universe, where human mind is defined, is way higher, because the Universe is not built of coins and humans, the interaction between which could be described with insignificantly low number of linguistic elements such as:

I throw a coin on the floor.
I throw a coin on the table.
I throw a coin behind the sofa.
I throw a coin through the window.
I throw a coin in the toilet.
I throw a coin in the corridor.

27.8.2002, a letter in "Theory of Universe and Mind - Part 2": "A human can output merely several tenths of bits per second [consciously]..."

Theory of Universe and Mind, Part 3, published 8.2003, "The Sacred Computer" #25: (...)

[See also the definitions of a "Control-Causality unit" etc. - the quote would become too big.]
14. "We", whatever we are, control a very little part of us. Say, we order our hand to throw a coin, and that consists of a sequence of simple instructions, sent to the muscles of our hand; the muscles consist of a huge number of particles, for which the control unit (the human; the matter about what we are conscious of) doesn't have knowledge and "we" ["the consciousness"] as that control unit cannot apply its power upon them individually; the resolution of its power is limited.

The muscles flex, that way they pull the bones and the whole fingers. Therefore, in fact the parts of the body, which we [believe] that we do "control" [cause wanted changes to with predicted and wanted target state and precision] do a big portion of their job on their own, i.e. "they know their job", and we - our consciousness - has only a superficial image/representation (представа) of that "job".

For example, the instruction with which we order the finger to flex, is described by, say, a few tenths of bits. The evaluation, of course, depends on the way we measure it.

We could converge the description down to: E.g.: which hand (1 bit) + which finger (2.3 bits) + to flex or to extend (1 bit) + force (I don't know how many degrees) + time for activation of the force.

The conscious information could be counted in bits by the fingers of the hands, while in order to flex the finger in the Universe, in the Main Memory, the whole information, which describes the finger and the connected devices to it from the hand, which motion pulls the finger, should be translocated - muscles, tendons; blood vessels, which feed it etc. - all that should go a particle by particle... I have no idea how many bits would take to define, an atom by an atom, just a finger...

51. The more sophisticated a device (entity) becomes, the more the capabilities for prediction of the future grow and the more it evades unpredictable and random states, i.e. states characterized with lack of a desired information. The more sophisticated an entity becomes, the more it employs the past, the memories, on order to build its behavior in the future, because it discovers that the past has patterns, therefore the future is predictable.

Compare "closely associated with short verbal expression such as a sentence or phrase" [slide] with "For example, the Resolution of Causality-Control and Perception in the example above is verb-noun ..." etc.

Regarding throwing etc. - the full prediction is not only where it will land exactly, but the entire trajectory of the object, where in both cases there could be resolutions, at best: all steps at the highest possible resolution of causation and perception for the "mother Universe", i.e. that would invole Planck constant scale etc.

The "Sacred Computer" works also have copies in the geocities archive oocities etc. and maybe also in