This Pizza Does Not Exist: StyleGAN2-Based Model Generates Photo-Realistic Pizza Images

One of the most recognizable and popular dishes in the world is pizza. From Margherita to Hawaiian to Sicilian, pizzas present in a wide variety of crusts topped with pepperoni, onions, mushroom, pineapple — you name it. If your love of pizza is as strong as your imagination, you may want to check out the new AI-powered Multi-Ingredient Pizza Generator (MPG), which can deliver all these mouth-watering pies and many more.

There’s no guarantee of the flavour, though, as the fancy MPG “pizzas” aren’t baked in an oven; rather they’re produced by a conditional Generative Adversarial Network (GAN) framework developed by researchers from Rutgers University and Samsung AI Center. Designed for synthesizing multi-label images, MPG combines a new conditioning technique with the StyleGAN2 structure to enforce intermediate feature maps to learn scalewise label information.

MPG overview

Food images are naturally complex and can be challenging for AI to generate. Ingredients have different colours and shapes and can be prepared with various cooking methods that affect their final appearance. Moreover, food images generally show finished dishes, with ingredients combined, and models must learn how to realistically arrange ingredients in this context.

The researchers note that previous food image generation approaches have either used images as input or generated blurry images. Their proposed conditional GAN framework however can effectively learn to create photo-realistic pizza images from combinations of specified ingredients, and also enable variable view points through “style noise” manipulation.

The approach incorporates a Scalewise Label Encoder (SLE) and a Classification Regularizer (CR) that guide the generator in the synthesis of the desired ingredients. The researchers also incorporated a matching loss in the discriminator to provide an additional signal to the generator, and built a new “Pizza10” dataset by relabelling pizzaGANdata using a subset of the ingredient labels. This re-annotation was designed to enable the labels to perform more accurately in multi-label classification and image retrieval tasks.

The researchers used the FID performance metric (on 10K images) to assess the quality of the generated images (a lower FID indicates the generated image distribution is more close to the real image distribution) and mean average precision (mAP) to evaluate the conditioning on the selected ingredients.

Compared with existing text-based image generation methods StackGAN2, CookGAN, and AttnGAN, the proposed MPG not only generated more realistic and diverse high-fidelity images but also improved conditioning on the chosen ingredients.

The researchers have uploaded a video demo that demonstrates MPG’s pizza-generation prowess compared to other models:

The team notes that — while pizzas are certainly fun — their framework can also be extended to other multi-label image generation scenarios. The paper MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs is on arXiv.

Footer