The post articulates how Visual Learning works at a high level within the RM2 Platform. The Visual learning feature in RM2 is integrated into the unified architecture where visual object detection and learning are integrated to achieve real-time detection and behavior prediction in a given environment.
In order to accurately detect objects and learn from correct data associations, it is critical to extract unit data parameters for the development of a proper foundation for the establishment of a relationship between unique parameters. This approach will result in high accuracy for the identification of objects, or for learning object behavior.
In this post, we draw an example of how unit parameters are extracted from a given image. In the following example, we selected a glass vase with flowers to demonstrate how the selected image is converted to tags for information processing routines.
Extraction of Shape, Light, and Color
The imported image is processed via an extraction routine to separate the image into three different layers. The first layer works on a method to draw out the edges to its single pixel, while the second layer forms the light information, and the third layer constitutes the color.
These layers are created from sub-routines that extract individual objects within an image using rules defining edge parameters. Using the edge of the object (shape), further information within the shape comprising of light information and associated colors are extracted and tagged to the object sequence.
The machine calculates the depth of the object utilizing the temporal reference (subsequent frames), where the direction of light (source) as well as the depth of the object are extracted to predict the overall size. The depth is calculated using the predicted distance of the source (light) and the color gradient on the surface of the frame, from each frame of the reference set.
The light tags are also employed in the classification of transparent and opaque objects. In the case of mirrors, detected symmetry or fractal patterns detected may signify an anomaly, or the machine can be supervised by calibrating the central object complex (self) to learn such scenarios.
Further, the shape extracting sub-routines are supported by super subroutines that are responsible for extracting primary shapes from the object. The relationship between these unique shapes is tagged to the shape definitions of the object.
These unique extracts are matched on a grid to derive micro tags associated with pattern differentiation among shapes. By mapping against a grid, the pattern analyzer decodes the shape pattern into codes using the highlighted dots, which are activated by overlaying the shape.
The tag creator reads the activated dots to produce an alphanumeric strip (symbolic sequence) using the positions of the activated dots. A sample micro tag sequence for a shape may appear like the strip below:
The X positions are denoted by numbers, whereas the Y positions are denoted by letters. In the sample above, you can see that D7 represents a dot where the prefix letter represents the Y position and the suffix number represents the X position of the dot. This micro-tag sequences, which are based on a particular pattern, are converted to macro-labels for the rapid detection of shape patterns.
For example, if all prefixes or suffixes are sequential, it would tag as [SL], denoting a straight line. Whereas, if the start prefix and suffix are the same as end prefix and suffix, it denotes a circle [CRL]. The distance between letter range (lowest-highest) gives the height of a curve in the line, and the distance between number range (lowest to highest) conveys the width of a line (if bent).
Using these patterns, the routine creates a macro tag assembly for the shape that will be incorporated along with the other tags correlating to light and color to form an object tag assembly (memory), which is further incorporated into the assembly of the frame, where the frame constitutes references to multiple objects. For machine vision applications requiring image processing like object detection or face recognition, the memory of a single frame(image) can be used.
However, for real-time processing or learning from real-world images or even videos, the tag assemblies of a single frame are sequenced based on a time-stamp to generate event related memories, which is an aggregation of frame related memories. By detecting behavior patterns among objects, the machine can learn or predict actions in real-life situations.
For observational or visual-based learning, parameters projecting the geo-position of the objects is critical to calculate distance, size and predicting movement. The position of the viewed object as well as the viewing object (self) is represented by utilizing a spatial grid. The integrated grid allows in analyzing position of different objects available with similar spatial parameters, which enables the machine to calculate size, distance, orientation, and depth.
The preset grid allows for 360° spherical mapping of the surrounding environment, enabling the machine to map coordinates for a given focus area, which may be scaled using the pan rule. The grid is instrumental in giving the machine the capacity to detect the dimensions of an object, as well as its distance, orientation, depth of external surface and gaps between objects in a single scene. The Y axis allows the machine to learn about uneven terrains, potholes, cliffs, or even low-level obstructions and the like.
This method will give machines the ability to measure their environment accurately and proceed with responsible actions by taking into account every possible behavior of every object in a given environment, thereby reducing dangers and risks. Autonomous Cars, Industrial Machines, Customer Service Robots and Multi-Utility Robots can depend on such accuracy as they deliver risk-free high-performance services.