Representing data as the parameters of an iterative map.

Given a set of parameters, the data is recovered as the (unique) fixed-point of the map.

Many patterns in nature exhibit self–similarity: they can be efficiently described via self–referential transformations. This property is common in natural and artificial objects, such as molecules, shorelines, galaxies and even images. This work develops a method to learn representations designed to maximally leverage self-similarity, both across regions of a single datum as well as across datasets.

To this end, we propose to represent data as the parameters of a contractive, iterative map defined on partitions of the data space: the collage operator. Instead of performing an extensive search for the set of parameters corresponding to each element, we use a neural network encoder to amortize the cost to a single forward pass. We refer to the encoder and collage operator together as a Neural Collage.

We investigate how to leverage collage representations produced by Neural Collages in various tasks, including generation and compression. Neural Collage are orders of magnitude faster than self–similarity–based image compression algorithms during encoding and offer compression rates competitive with other deep implicit methods. We showcase further applications of Neural Collages as a method to produce fractal art.

Data can be represented in a variety of ways. In the original space, via a collection of values (for example, pixels), as a set of coefficients and bases functions (for example, Fourier coefficients and complex exponentials), or implicitly via the parameters of a map (for example, pixel locations to pixel values).

Although the standard in deep learning is to stack learned transformations on data, a "good" representation for the input is often the key to improved performance and more efficient, smaller models. To give a few examples: transformers, diffusion models, implicit neural representations (SIREN, NeRF, COIN) and frequency-domain models (FNOs, T1) all rely to a different extent on preprocessing / augmenting inputs and their domains (for example, grids of pixel locations). The design space here involves parametrization of the input domains, positional encoding schemes, and choice of frequency-domain transforms.

In this work we investigate a different class of representations, where a data point is described via the parameters of a function that converges to it regardless of the initial condition. In particular, we consider contractive functions that by construction require a small number of parameters: collage operators. Decoding (parameters to data) is achieved by applying a collage operator until convergence, or solving for its fixed-point. Encoding (data to parameters) is solved via a neural network encoder trained to produce collage parameters corresponding to data.

A **Neural Collage** is comprised of both a neural network encoder as well as a collage operator as decoder.

Consider a two-dimensional vector \(x \in \mathbb{R}^2\) that happens to be of interest
for a downstream task, for example regression. Rather than representing it directly, we can
use the parameters \(w\) of a contractive map with \(x \) as its fixed-point, for example a simple affine function:

\[{
x_{k+1} = a \circ x_k + b,\quad |a_i|< 1
}\]

Given \(w\), \(x\) can be recovered by repeatedly applying the map starting from any other point as shown on the right. We are now using four real numbers instead of the two elements of \(x\): not exactly a gain!

However, choosing other classes of functions can help reduce the number of parameters required, as is the case for collage operators. Note that this is generally a lossy process, meaning that the original data is only recovered to a certain degree of accuracy.

We show how collage parameter representations obtained through Neural Collages can be used in learning tasks:

```
def collage_operator(self, z, collage_weight, collage_bias):
"""Collage Operator (decoding). Performs the steps described in Def. 3.1, Figure 2."""
# Split the current iterate `z` into source patches according to the partitioning scheme.
domains = img_to_patches(z)
# Pool domains (pre augmentation) to range patch sizes.
pooled_domains = self.pool(domains)
# If needed, produce additional candidate source patches as augmentations of existing domains.
# Auxiliary learned feature maps / patches are also introduced here.
if self.n_aug_transforms > 1:
pooled_domains = self.generate_candidates(pooled_domains)
pooled_domains = repeat(pooled_domains, 'b c d h w -> b c d r h w', r=self.num_ranges)
# Apply the affine maps to source patches
range_domains = torch.einsum('bcdrhw, bcdr -> bcrhw', pooled_domains, collage_weight)
range_domains = range_domains + collage_bias[..., None, None]
# Reconstruct data by "composing" the output patches back together (collage!).
z = patches_to_img(range_domains)
return z
```

We choose collage operators for the following properties:

Based on our selection of affine maps (note: other functions may also be selected), Collages can be represented with just two scalar parameters acting on each pair of input and output patches. This work formalizes collage operators as an instance of a "partitioned iterated function system"; please see our paper for further details. In practice, the "scalar" affine map is broadcasted to each pixel of the patch in consideration.

Since each affine map is broadcasted to all pixels, it maps each source to a range patch via a scalar multiple that is independent of the resolution or the dimensions of the patch being transformed. This allows the collage operator to be applied in a resolution-agnostic manner. The mosaic, fractal-like details introduced when decoding at higher resolutions reflect the structure of the collage operator: when the entire image is provided as an input patch, multiple levels of "fractalness" can be observed.

A single set of collage parameters \(w\) identifies a unique data point. Contractiveness can be enforced easily by constraining each affine map between input and output patches.

A Neural Collages encoder can be trained to produce collage parameters via a reconstruction objective. Classical results on contraction maps provide a method to upper bound the distance between input data and fixed-point of a collage by a single collage operator step, rather than by solving explicitly for the fixed-point.

Partitioning schemes directly affect the size of the representation: larger patches yield shorter parametrizations, but a loss of accuracy. On the right, we show the decoding steps of a collage operator.

Increasing the size of output patches reduces the accuracy and requires significantly smaller number of floats to represent the image, resulting in high compression ratios. This example does not employ techniques used in our Neural Collage compression experiments, such as introducing auxiliary patches, augmentations, and bit-packing collage parameters. Thank you DALLE-2 for the image!

We train deep generative models (VDVAEs) with collage parameters as latent variables. The ELBO is calculated on the images decoded through the collage operator, given a sample of collage parameters. Collage parameter representations are "rate efficient" and generally smaller than data dimensionality, resulting generative models that can be easier to optimize.

Regardless of the image size used during training, introducing Neural Collages in a generative model allows for sampling (decoding an image from a sample of collage parameters) at arbitrary resolution. Note that the details introduced by the upsampling procedure depend on the collage operator type: for example, augmentation transforms used to produce auxiliary patches from input patches at each iteration.

Binarized MNIST samples from a Collage VDVAE decoded at the standard resolution (28x28).

Binarized MNIST samples from a Collage VDVAE decoded at 40x the resolution (1440x1440).

We train Neural Collages on a random crops dataset of high-resolution aerial images. At test time, we use this pre-trained encoder to compress images of different resolutions through independent application on each crop. Encoding new images requires a single forward pass of a Neural Collage encoder, resulting in speedups of several orders of magnitude over fractal compression and COIN.

To leverage similarity across the entire dataset, we introduce auxiliary input patches into a collage iteration (the "generate candidates" step in code). Better compression ratios are achieved by regularizing the collage parameters to assume small values, followed by a bit-packing step.

Neural Collages can fractalize images (see Gradio demo at the top of this page). To obtain this effect, train a restricted Neural Collage without source domain partitioning on a reconstruction objective i.e., with the only input patch being the entire image. Fractalizing (including training) takes less than a minute on a single GPU. Follow the notebook tutorial for more details!

Disclaimer: No author was harmed during the generation of these animations.

There are questions left open for further investigation. We are looking into improved partitioning schemes, Neural Collages for other data modalities, training deep models directly on collage representations, and ways to control detail introduced when upsampling.

Neural Collages are inspired by Partitioned Iterated Function Systems (PIFS), the backbone of fractal compression schemes. Our official implementation includes a GPU-accelerated version of the PIFS algorithm (without quadtrees), used as a baseline in the image compression experiment. The main paper reports a detailed account of the history of fractal compression and its main variants.

And yes, Neural Collages can be applied to point clouds (as the first IFSs of Barnsley et al.) to produce "true" fractals!

```
@article{poli2022self,
title={Self-Similarity Priors: Neural Collages as Differentiable Fractal Representations},
author={Poli, Michael and Xu, Winnie and Massaroli, Stefano and Meng, Chenlin and Kim, Kuno and Ermon, Stefano},
journal={arXiv preprint arXiv:2204.07673},
year={2022}
}
```