Skip to content

Semantic image editing in realtime with a multi-parameter interface for StyleCLIP global directions

Notifications You must be signed in to change notification settings

titusss/PixelAlchemist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

🎛️🎚️ Pixel Alchemist: Semantic image editing in realtime with a multi-parameter interface for StyleCLIP global directions

Start interactive notebook: Open In Colab

Edit StyleGAN-generated images in realtime with custom prompts and multiple parametric controls. Based on https://arxiv.org/abs/2103.17249

FFHQ

ffhq

LSUN Churches

lsun_churches

LSUN Car

lsun_car

Setup

You need a (free) ngrok authtoken for this notebook: https://dashboard.ngrok.com/get-started/your-authtoken

Open the Google Colab Notebook, make sure your runtime has a GPU, run all cells, and open the web interface from the last cell. If you run the notebook for the first time, the GUI will ask you to register for a ngrok account to get an authoken that you have to paste in the corresponding cell.

Usage

Enter any text prompt under each slider. Each model has a pre-determined list of prompts that work well, but feel free to enter whatever you can think of. You can dynamically add and remove sliders with the '+' and '-' button. The position of each slider controls how much more of that text prompt should be visible in the generated image. A negative value will decrease the presence of whatever the text prompt describes. A slider with a negative value and the prompt 'Trees' will therefore remove trees from the image. Lastly, the threshold knob on top of each slider determines how many parts of the image will be affected by a change. A low threshold value will change almost everything in the image, while a high value will only touch the most important parts. For example, the prompt 'red eyes' on ffhq will only change the color of the iris when a high threshold is applied, but might change the mouth, nose, and whole expression when the knob is fully turned to the right. When the resulting image is black, the threshold is too high and has to be decreased.

Using your own model

Refer to the StyleCLIP repository, here, to learn how to preprocess your own StyleGAN model for global directions. You can use your model with the Pixel Alchemist notebook by uploading the "fs3", "W", "S", and "S_mean_std" files from the preprocessing and your .pkl file containing your StyleGAN weights to two different Google Drive folders. Finally, you have to add the two links (which need to be formatted for gdown) to the cell responsible for downloading all models from Google Drive.

Roadmap

  • Initial code release
  • Info note for model bias & ethics
  • Add MIDI control
  • Add more datasets (Imagenet-512 StyleGAN-XL, Conditional WikiArt, StyleGAN-Human)
  • Edit uploaded images with inversion
  • Text-to-image feature

References

StyleCLIP: Patashnik, Or, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. “Styleclip: Text-Driven Manipulation of Stylegan Imagery.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2085–94.

Posters dataset: The images used for the graphic design dataset are courtesy of: typo/graphic posters, André Felipe Menezes, www.typo-graphicposters.com

About

Semantic image editing in realtime with a multi-parameter interface for StyleCLIP global directions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published