def collage_operator(self, z, collage_weight, collage_bias):
"""Collage Operator (decoding). Performs the steps described in Def. 3.1, Figure 2."""
# Split the current iterate `z` into source patches according to the partitioning scheme.
domains = img_to_patches(z)
# Pool domains (pre augmentation) to range patch sizes.
pooled_domains = self.pool(domains)
# If needed, produce additional candidate source patches as augmentations of existing domains.
# Auxiliary learned feature maps / patches are also introduced here.
if self.n_aug_transforms > 1:
pooled_domains = self.generate_candidates(pooled_domains)
pooled_domains = repeat(pooled_domains, 'b c d h w -> b c d r h w', r=self.num_ranges)
# Apply the affine maps to source patches
range_domains = torch.einsum('bcdrhw, bcdr -> bcrhw', pooled_domains, collage_weight)
range_domains = range_domains + collage_bias[..., None, None]
# Reconstruct data by "composing" the output patches back together (collage!).
z = patches_to_img(range_domains)
return z
We choose collage operators for the following properties:
Based on our selection of affine maps (note: other functions may also be selected), Collages can be represented with just two scalar parameters acting on each pair of input and output patches. This work formalizes collage operators as an instance of a "partitioned iterated function system"; please see our paper for further details. In practice, the "scalar" affine map is broadcasted to each pixel of the patch in consideration.
Since each affine map is broadcasted to all pixels, it maps each source to a range patch via a scalar multiple that is independent of the resolution or the dimensions of the patch being transformed. This allows the collage operator to be applied in a resolution-agnostic manner. The mosaic, fractal-like details introduced when decoding at higher resolutions reflect the structure of the collage operator: when the entire image is provided as an input patch, multiple levels of "fractalness" can be observed.
A single set of collage parameters \(w\) identifies a unique data point. Contractiveness can be enforced easily by constraining each affine map between input and output patches.
A Neural Collages encoder can be trained to produce collage parameters via a reconstruction objective. Classical results on contraction maps provide a method to upper bound the distance between input data and fixed-point of a collage by a single collage operator step, rather than by solving explicitly for the fixed-point.
Partitioning schemes directly affect the size of the representation: larger patches yield shorter parametrizations, but a loss of accuracy. On the right, we show the decoding steps of a collage operator.
Increasing the size of output patches reduces the accuracy and requires significantly smaller number of floats to represent the image, resulting in high compression ratios. This example does not employ techniques used in our Neural Collage compression experiments, such as introducing auxiliary patches, augmentations, and bit-packing collage parameters. Thank you DALLE-2 for the image!
We train deep generative models (VDVAEs) with collage parameters as latent variables. The ELBO is calculated on the images decoded through the collage operator, given a sample of collage parameters. Collage parameter representations are "rate efficient" and generally smaller than data dimensionality, resulting generative models that can be easier to optimize.
Regardless of the image size used during training, introducing Neural Collages in a generative model allows for sampling (decoding an image from a sample of collage parameters) at arbitrary resolution. Note that the details introduced by the upsampling procedure depend on the collage operator type: for example, augmentation transforms used to produce auxiliary patches from input patches at each iteration.
We train Neural Collages on a random crops dataset of high-resolution aerial images. At test time, we use this pre-trained encoder to compress images of different resolutions through independent application on each crop. Encoding new images requires a single forward pass of a Neural Collage encoder, resulting in speedups of several orders of magnitude over fractal compression and COIN.
To leverage similarity across the entire dataset, we introduce auxiliary input patches into a collage iteration (the "generate candidates" step in code). Better compression ratios are achieved by regularizing the collage parameters to assume small values, followed by a bit-packing step.
Neural Collages can fractalize images (see Gradio demo at the top of this page). To obtain this effect, train a restricted Neural Collage without source domain partitioning on a reconstruction objective i.e., with the only input patch being the entire image. Fractalizing (including training) takes less than a minute on a single GPU. Follow the notebook tutorial for more details!
Disclaimer: No author was harmed during the generation of these animations.
There are questions left open for further investigation. We are looking into improved partitioning schemes, Neural Collages for other data modalities, training deep models directly on collage representations, and ways to control detail introduced when upsampling.
Neural Collages are inspired by Partitioned Iterated Function Systems (PIFS), the backbone of fractal compression schemes. Our official implementation includes a GPU-accelerated version of the PIFS algorithm (without quadtrees), used as a baseline in the image compression experiment. The main paper reports a detailed account of the history of fractal compression and its main variants.
And yes, Neural Collages can be applied to point clouds (as the first IFSs of Barnsley et al.) to produce "true" fractals!
@article{poli2022self,
title={Self-Similarity Priors: Neural Collages as Differentiable Fractal Representations},
author={Poli, Michael and Xu, Winnie and Massaroli, Stefano and Meng, Chenlin and Kim, Kuno and Ermon, Stefano},
journal={arXiv preprint arXiv:2204.07673},
year={2022}
}