1) In the 2D image the pixels of the brown square in the middle on the top side of the cube is actually the same pixel color as the orange square in the side which looks in the shadow part of the cube. Notice the picture below, there's more contrast and the square looks darker.
2.) Squares A and B in the 2D picture have the same absolute shade of gray.
Looks mysterious, there are claims that these are "prewired" bugs. My explanation is pretty straight, and these color illusions are rather a feature than a bug. It seems as a bug only in perverse cases as in the illusions, and only before the reason is understood:
2D images, and even the "stereo" images from our eyes from the real world are ambiguous regarding the light source and the colors, because the scenes are usually partially observable and because there's only perception of absolute color - "RGB" and luminance (rods and cones ). Besides light also has color and there can be reflections and shadows which can change both luminance and color of the affected areas. Another ambiguous components are the texture, reflectiveness and transparency of the objects, which also alters the low level RGB properties.
"Light sources" are not perceived at the low level input - it is a construct created in order to explain the patterns of differences in luminance and color which are observed. The places with high luminance are seen as "lit", the ones with lower - "shadowed", and the gradients of luminance seem as light following shapes, dispersing, reflecting etc. What is actually "following the shapes of the objects", dispersing, reflecting etc. is the luminance of the "pixels" in the 2D image.
It's similar with transparency, reflectiveness etc. - these are stable correlation observed in many samples, which gave birth of the prediction that given such correlation, the object/that part of the 2D image or reconstructed 3D scene should have a property "transparency" etc.
Cognitive hierarchy gets samples of 2D images which are projections of 3D objects. Even more - stereo samples allow it to do its kind of "triangulation" and reconstruct distances so that we perceive depth if we look with both eyes.
It goes even further - if one has grown with two healthy eyes, distance can be measured even if you close one of them. And even more: humans see 3D space in 2D projections on a flat surface (a picture or electronic screen) using only one eye!
That's possible because of the "aggressively" repetitive perception of correlations between the 2D projections on the retina and the perspective changes of the image after controlled and predictable changing the viewing angle or/and object coordinates.
The correlations of the change of luminance and color depending on the light source (changes in contrast, luminance and color) are also observed.
A lot of 2D clues for the 3-rd dimension of the objects and the position and intensity of light are extracted and "fixed" so that future 2D images are perceived as 3D even if having just connected parallel lines and other minute details characteristic for perspective transformation.
I've discussed a little bit on this back in 2004 in my [Teenage] Theory of Universe and Mind, Part 4:
The section starts from the last line at p.30 in this translation.
// The picture there seems not to appear in the document, it's the one below: //
Mind sees 3D space and light source(s) everywhere
The price of the simplicity of 3D-clues and light is that these repetitive correlations cause mind to expect 3D space and light source(s) even in the 2D flat images. Artists use to draw 2D projections of a 3D space, both photographs and video do etc, the clues are constantly reinforced.
It seems that the highest consciousness level can hardly inhibit 3D perception/integration on demand and see 2D images as 2D if any 3D-clues intervene and confuse it.
[Edit/note from: 23/7/2012 - Therefore 3D is more powerful, it has more impact to mind and seems to come first in such perceptions, and that's reasonable - because humans initially sense the world as 3D with two eyes, and all our physical muscular reactions, which in essence are just sequences of coordinate adjustments, are in 3D, not in 2D, the real world is perceived as not flat. Follow the blog for continuation on the topic.]
Why the brown square looks orange?
Because the 3D-reconstructed scene seems to display a cube which is lit from the top and the side which is directed towards the screen is in the shadow, there are too much clues suggesting this, so mind believes.
However if the square in the shadow has the same absolute 2D retinal color ("RGB") as the lit square, and it is still in a shadow, then the original unlit color of the shadowed square should have been orange, because orange at this relatively lower luminance turns into the brown - the same like the top square was.
The case with the gray squares and the cylinder on the other picture is similar. The connections and borders added on the right illustration destroy the clues suggesting that these particular squares are part of the 3D scene with the cylinder and the light source behind it, because they don't follow the pattern of light. There are not clues that they are behind the scene either (being partially covered - there are no interrupted lines), so the "Z-buffer" puts them in front of the other elements. Now there is not a light source and a shadow over those squares, and their absolute retinal RGB values are not adjusted differently.
Mystery unveiled... :)
- A quick hypothesis might be that young infants that don't see depth yet may not have this optical illusion.
- However I'm not sure about that - I suspect that babies should understand that light sources change color and luminance of the areas they impact long before they have robust 3D-perception, 3D is not needed, so I guess they may have clues of light and shadow interaction and see the optical illusion even before they see 3D.
It will be just a 2D optical illusion, because I suspect brain probably does a partial 2D light-color reconstruction before it goes into 3D. (It would be cool if I could conduct some kind of experiments, but not possible yet.)
3D-reconstruction is ambiguous and these drawings suggest that it's done locally, based on local 2D-clues. Globally the picture might be absurd, but mind fails in focusing in on all 3D-clues, so it sees correct perspective in the different elements.
Why can't we correct this illusion consciously?
One reason I'd say is that is not really an illusion.
Another answer is in the recent series I posted about higher-lower level feedback/reflection in the cognitive hierarchies which I think is much weaker/narrower than the feedforward.
Higher level understands some of the "mistakes" and "delusions" (wrong predictions) of the lowers if takes details separately out of their context, but it cannot correct them if they are too many levels below and if there are too many *correct* predictions at the lower levels. The clues suggesting there's no illusion and the predictions *work* are reinforced for too many cases to be dismissed and altered.
In the examples above everything in the low level input suggests that there is no illusion: all other correlations between absolute pixel values confirm that these are pictures of 3D-scenes with their light sources and shadows.
Another explanation might be that high levels lack direct access to the lowest levels - retinal RGB-luminance values.
To be continued...
* Thanks to a show on Discovery Channel about optical illusions that made me think about this yesterday and realize this mechanism.
** Happy New Year, wishing 2012 to be more productive in AGI/SIGI.
- Оптични илюзии с цветове - възстановяване на триизмерните координати на източника на светлина в двуизмерна проекция на тримерна сцена.
(C) Todor "Tosh" Arnaudov, 1/1/2012 (except illusion photos)