Apple researchers are pushing forward with efforts to bring autonomous vehicle systems to public roads, and last week published an academic paper outlining a method of detecting objects in 3D point clouds using trainable neural networks. While still in its early stages, the technology could mature to improve accuracy in LiDAR navigation solutions.
Like other recent scholarly articles published by Apple engineers, the latest entry, "VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection" by AI researcher Yin Zhou and machine learning specialist Oncel Tuzel, was made public (PDF link) through the arXiv archive of scientific papers.
In its article, Apple notes accurate detection of objects in 3D point clouds, like those generated by LiDAR arrays, is a sticking point in a number of burgeoning real-world applications. From autonomous cars to robotic vacuums, machines that navigate the world around them without the assistance of human operators need to detect critical object with speed and precision.
Compared to 2D image-based detection, LiDAR technology proves to be a more reliable alternative as it provides depth information to better localize objects in space, Apple says. However, LiDAR point clouds, generated by emitting laser pulses and logging the time it takes for the light to return after bouncing off a solid surface, are sparse and have highly variable point density, thus causing a host of problems.
Current state-of-the-art techniques designed to manage data interpretation involve manually creating feature representations for said point clouds. Some methods project point clouds into a bird's eye perspective view, while others transform the data into 3D voxel grids and encode each voxel with certain features. Manually crafting feature representations introduce an "information bottleneck" that restricts such systems from efficiently leveraging 3D shape information, according to Apple.
Instead, Zhou and Tuzel propose the implementation of a trainable deep architecture for point cloud based 3D detection. The framework, called VoxelNet, uses voxel feature encoding (VFE) layers to learn complex features for characterizing 3D shapes. In particular, the technique breaks down the point cloud into 3D voxels, encodes the voxels via stacked VFE layers and renders a volumetric representation.
In tests, Apple's methodology showed promise, outperforming current LiDAR based detection algorithms and image-based approaches "by a large margin." This is according to evaluations run through the KITTI 3D object detection benchmark, which Apple used to assess its process. VoxelNet was trained to detect three basic objects -- car, pedestrian and cyclist -- in a variety of tests.
Aside from theoretical research, Apple is currently evaluating a self-driving vehicle testbed on the streets of Cupertino, Calif. The company's efforts in autonomous technology began under the "Project Titan" initiative, which sought to build a branded self-driving car from the ground up. After significant investment and multiple employee reassignments, Titan hit a number of snags and was ultimately put on ice in late 2016, though remnants of the initiative, like supporting software and hardware, remain active.
A report in August claimed Apple is looking to parlay the technology into an autonomous shuttle that will ferry employees between its Silicon Valley campuses.
While Apple's research paper focuses heavily on autonomous vehicle navigation, the tech described can also be applied to augmented reality systems that use depth mapping hardware to detect real-world objects. The new iPhone X sports equipment similar to LiDAR arrays in its front-facing TrueDepth camera, which incorporates a miniaturized dot projector for accurate depth mapping operations. If TrueDepth's range is extended, and mounted on the rear of a portable device, it could potentially be paired with advanced software to power an entirely new consumer AR experience.