OpenAI takes AI art to next level with the newly launched Point-E
OpenAI open-sourced Point-E, an ML system that creates a 3D object given a text prompt
OpenAI creates a new standard in AI art with DALL-E 2. The multimodal model generates impressive, versatile, and creative motifs and can modify existing images to match the style. As a description one sentence is enough, several sentences work even better and create a more detailed picture.
The next breakthrough to take the AI world by storm might be 3D model generators.
Point-E is basically an AI that generates 3D models. According to a paper published alongside the code base, Point-E can produce AI art of 3D models in one to two minutes on a single Nvidia V100 GPU. Point-E doesn’t create 3D objects in the traditional sense.
Rather, it generates point clouds, or discrete sets of data points in space that represents a 3D shape — hence the cheeky abbreviation. (The “E” in Point-E is short for “efficiency,” because it’s ostensibly faster than previous 3D object generation approaches.) Point clouds are easier to synthesize from a computational standpoint, but they don’t capture an object’s fine-grained shape or texture — a key limitation of Point-E presently. To get around this limitation, the Point-E team trained an additional AI system to convert Point-E’s point clouds to meshes. But they note in the paper that the model can sometimes miss certain parts of objects, resulting in blocky or distorted shapes.
Outside of the mesh-generating model, which stands alone, Point-E consists of two models: a text-to-image model and an image-to-3D model. The text-to-image model, similar to generative art systems like OpenAI’s own DALL-E 2 and Stable Diffusion, was trained on labeled images to understand the associations between words and visual concepts. The image-to-3D model, on the other hand, was fed a set of images paired with 3D objects so that it learned to effectively translate between the two.
When given a text prompt — for example, “a 3D printable gear, a single gear 3 inches in diameter and half-inch thick” — Point-E’s text-to-image model generates a synthetic rendered object that’s fed to the image-to-3D model, which then generates a point cloud.
After training the models on a dataset of “several million” 3D objects and associated metadata, Point-E could produce colored point clouds that frequently matched text prompts, the OpenAI researchers say. It’s not perfect — Point-E’s image-to-3D model sometimes fails to understand the image from the text-to-image model, resulting in a shape that doesn’t match the text prompt.
Still, it’s orders of magnitude faster than the previous state-of-the-art — at least according to the OpenAI team.