Picasso Sketches with Evo-canvas
Intro
A variety of token-lists containing the word Picasso were used to produce a CLIP’s score against evolved images (see evo-canvas). Rather than using CLIP to drive the algorithm through an existing latent space of images, as is usually the case, here the algorithm is required to produce its own images, ‘drawing’ them using visual primitives such as polygons (triangular by default), lines or voronoi diagrams. The primary interest is aesthetic and though most evolved images, perhaps surprisingly, are visually appealing, human curation is and should be part of the process.
Polygons
With these canvases evolution was allowed to explore color space and shape space, placing polygons (triangular by default) on the canvas. A list of mutable points (n>=3) defined the polygon and these polygons could then be translated, rotated and scaled. The resulting canvas was then scored by CLIP (e.g. how close is this image to a bowl of daffodils) and a population of scored canvases used to generate a new generation, which was mutated and then scored.
Evocanvas produced some interesting and aesthetically appealing images, in my opinion (usual artistic caveats ad. inf.). While surreal they suggest a bullish, Picassoesque quality. This one seems to show a bull/minotaur rearing on hind-legs with a pleasing explosive, dynamic quality:
This one is a visual cacophony of legs, horns, movement:
Bezier curves
Here evolution was allowed to use parametric (Bezier) curves to draw, based on the Picasso, bull tokens. With a few lines available, the algorithm captures something. The CLIP scores show the computer ‘sees’ something there:
Symmetrical bezier curves
By driving the algorithm to a symmetrical visual space, strong bull-like images were produced. Starting from random populations, evolution is able to produce an enormous (limitless?) variety of canvases that conform to a greater or lesser degree to the cue words. This creative fecundity is a hallmark of evolutionary algorithms. Any number of icons/logos can be generated while a human eye selects those that please most.
Conclusion
Using CLIP to score text to image pairs produced recognizable images with, beauty being in the eyes of this beholder, pleasing aesthetics. This technique is different from the more usual process of using CLIP to drive a network through a pre-existing (huge) space of latent images. Here evolution has to create canvases, ab-initio. The canvases are necessarily cruder and the evolutionary algorithm not exactly efficient. To give a rough guide, the canvases shown were evolved within an hour or two on an AMD Threadripper PC (16 cores) with NVidia 1070GTX graphics card.