The DALL-E Mini software provided by a group of open source developers isn’t perfect, but it may be able to effectively create photos that match people’s textual descriptions.
As you scroll through modern social media feeds, you’re more likely to notice illustrations with captions. They are popular now.
The photos you are looking at are probably made possible by a text-to-image program called DALL-E. Before posting an illustration, people insert a word and convert it into an image with an artificial intelligence model.
for example, twitter The user posted a tweet with the text “Should I live or die, a rabbi with an avocado, a marble sculpture”. The attached photo is very elegant and shows a marble statue of a bearded man in a robe and bowler hat holding an avocado.
But most of what’s happening in this area comes from a relatively small number of people sharing photos, and in some cases, creating high engagement. This is because Google and OpenAI have not made the technology widely available to the public.
Many of the early users of OpenAI Friends and relatives Of employees. If you want access, you need to join the waiting list to indicate whether you are a professional artist, developer, academic researcher, journalist, or online creator.
“We are working hard to accelerate access, but it can take some time to reach everyone. As of June 15, we invited 10,217 people to DALL-E. “I did,” wrote Joanne Jang of OpenAI: Help page On the company website.
One of the publicly available systems DALL-E mini..It uses Open source code It comes from a loosely organized team of developers and is often overloaded with demand. When I try to use it, the dialog box “Too much traffic. Please try again” is displayed.
This is a bit reminiscent of Google’s Gmail service, which attracted people with unlimited email storage space in 2004. Early adopters were initially only available by invitation and kept millions of people waiting. Today, Gmail is one of the most popular email services in the world.
Creating images from text is not as ubiquitous as email. But technology does have moments, and part of its appeal lies in its monopoly.
Midjourney, a private research institute, gives people Fill out the form If you want to try an image generation bot from the Discord chat app channel. Only a group of selected people use Imagen and post photos from it.
The text-to-image service is sophisticated, identifying the most important parts of the user’s prompts and guessing the best way to explain those terms. Google has trained Imagen models using hundreds of in-house AI chips, as well as 460 million internal image and text pairs. External data..
The interface is simple. Usually there is a text box, a button to start the generation process, and an area to display the image. To indicate the source, Google and OpenAI add a watermark in the lower right corner of the DALL-E2 and Imagen images.
Companies and groups building software are, of course, concerned that everyone will hit the gate at once. Processing web requests to execute queries using these AI models can be costly. More importantly, the model is not perfect and does not always produce results that accurately represent the world.
Engineers trained their models on an extensive collection of words and photos from the web, including photos posted on Flickr.
San Francisco-based OpenAI recognizes the potential harm that can result from models that have learned how to create images by essentially scrutinizing the Web. To address the risk, employees have removed violent content from their training data. There is also a filter that prevents DALL-E2 from producing images if the user sends a prompt that could violate the company. policy Against nudity, violence, intrigue, and political content.
“There is an ongoing process to improve the safety of these systems,” said Prafulla Dhariwal, a research scientist at OpenAI.
Understanding the resulting bias is also important and represents widespread concern about AI. Texas developer Boris Dayma and others working on the DALL-E Mini described the issue as follows: explanation Of their software.
“Occupations that show a higher level of education (engineers, doctors, scientists, etc.) or higher physical labor (construction, etc.) are primarily represented by whites,” they write. “In contrast, nurses, secretaries, or assistants are usually women, often white.”
Google describes a similar drawback of the Imagen model as follows: academic paper..
Despite the risks, OpenAI is excited about the kind of things technology can enable. Dhariwal said it can provide individuals with creative opportunities and support commercial applications for interior design and website dress-up.
Results should continue to improve over time. DALL-E 2, Introduced In April, it spewed out more realistic images than the initial version announced by OpenAI last year and the company’s text generation model. GPTHas been refined from generation to generation.
“We can expect that to happen with many of these systems,” Dariwal said.