Sunday, January 3, 2021

CapsNet - "We can do it" with 3D point clouds - compare to 2012 AGI Digest Letter compilation about Vision as 3D-Reconstruction

 
An article from a few weeks ago: https://syncedreview.com/2020/12/18/we-can-do-it-geoffrey-hinton-and-ubc-ut-google-uvic-team-propose-unsupervised-capsule-architecture-for-3d-point-clouds/

"When Turing Award Honoree Dr. Geoffrey Hinton speaks, the AI community listens. Last week, Hinton tweeted, “Finding the natural parts of an object and their intrinsic coordinate frames without supervision is a crucial step in learning to parse images into part-whole hierarchies. If we start with point clouds, we can do it!

The comments came with the publication of Canonical Capsules: Unsupervised Capsules in Canonical Pose,"


Compare to:

https://artificial-mind.blogspot.com/2019/07/shape-bias-in-cnn-for-robust-results.html


Read in:  Chairs, Caricatures and Object Recognition as 3D-reconstruction (2012)


https://artificial-mind.blogspot.com/2017/12/capsnet-capsules-and-CogAlg-3D-reconstruction.html


The 4-th email from the "General algorithms..." thread:

Todor Arnaudov Fri, Apr 27, 2012 at 1:12 AM
To: agi@listbox.com

I don't know if anyone on this discussion realized, that "Invariance" in vision is actually just a

- 3D-reconstruction of the scene, including light source and the objects

- Also colours/shades and the textures (local/smaller higher resolution models) are available (for discrimination based on this, may be quicker/needed for objects which are otherwise geometrically matched)

[+ 16-7-2013 - conceptual “scene analysis”, “object recognition” involves some relatively arbitrary, or just flexible, selection criteria for the level of generalization for the usage of words to name the “items” in the scene. To Do: devise experiments with ambiguous objects/scenes, sequences. … see “top-down”, … emails 9, 14, 15]

If the normalized 3D-models (preferably to absolute dimensions), lights and recovered original textures/color (taking into account light and reflexion) are available, everything can be compared perfectly and doesn't require anything special, and no "probabilities" or something [anything]. The textures and light most of the time don't even alter the essential information - the 3D-geometric structure.

"2D" is just a crippled 3D

"Invariants" in human fully functional vision are just those 3D-models (or their components, "voxels:) built in a normalized space, the easiest approach for quick comparison is voxels, it might be something mixed with triangles, of course textures and colours also participate.

Every 3D-model has a normalized position per its basis, and also some characteristic division of major planes and position between the major planes, and there are "intuitive" ways to set the basis --> gravity/the ground plane foundations, which is generalized to "bottom", i.e.:

(...)




Voxels and point clouds:

https://3dprinting.stackexchange.com/questions/4556/layman-term-explanation-of-the-difference-between-voxel-and-point-cloud

https://www.survtechsolutions.com/post/what-are-point-clouds

https://en.wikipedia.org/wiki/Voxel

https://en.wikipedia.org/wiki/Point_cloud