ICML 2023 Recap

My top 50 posters from ICML Honolulu.

Vision

Vision Transformers

  • VIT-22B models have much better alignment with human visual perception: 87% shape bias versus 20-30% in prior models. Prior models were much more texture-biased.

    alt_text

  • A hierarchical VIT i.e. non-uniform feature size through the depth of the network. Also removes unnecessary bells and whistles from prior work by learning those biases instead.

    alt_text

  • VIT with global attention interspersed with regular attention

    alt_text

2D

  • Use both text and vision to improve classification of novel classes.

    alt_text

  • Learning a displacement field to learn the correspondence between photos and sketches

    alt_text

  • Interpretable subspaces in image representations extracted using CLIP

    alt_text

  • Measuring compositionality and invertibility for object-centric representations

    alt_text

  • Multi-view self-supervised learning analyzed using Mutual Information.

    alt_text

  • Class collapse and feature suppression during contrastive learning

    alt_text

  • The latest on hyperbolic representations.

    alt_text

3D

  • Spherical CNNs (rotation equivariant) scaled to 5e6 convolutions and 1e7-1e9 feature maps

    alt_text

  • Object pose canonicalization measured for stability and consistency. They also train on multiple object classes.

    alt_text

  • Signed distance functions learnt “provably.”

    alt_text

Video

  • Keypoint learning in videos.

    alt_text

  • Efficient episodic recall (aka “video search”).

    alt_text

Generative Models

  • Electrostatics-based generative model with better FID numbers than diffusion

    alt_text

  • Animated 3D models without any additional dataset.

    alt_text

  • Diffusion without upsamplers. Harder to train and inefficient.

    alt_text

  • Consistency models: diffusion without multi-step denoising.

    alt_text

  • Diffusion models evaluated on one-shot drawing task.

    alt_text

  • NeRF from fewer samples using geometric invariances.

    alt_text

World Models/RL

alt_text

alt_text

alt_text

alt_text

alt_text

alt_text

Transformers

  • Beautiful work showing transformers have a “lower-degree” bias toward polynomial terms of lower degree, which is somewhat counterintuitive given their pairwise attention mechanism.

    alt_text

  • Improving the focal loss by taking into account the second highest predicted logit, rather than naively maximizing entropy.

    alt_text

  • Do early layers generalize while later layers memorize? Apparently not–memorization can be localized to a small number of neurons dispersed across layers.

    alt_text

  • Characterizing training trajectories of different representation learning tasks

    alt_text

  • Is local flatness desirable for generalization? Not necessarily. There are more promising indicators such as SGD-based disagreement on unlabelled data.

    alt_text

  • Category-theory view of disentanglement

    alt_text

  • Using category theory to show that foundation models cannot be used for everything, but CLIP-like algorithms do have “creativity”

    alt_text

Novel architectures

  • Super simple long convolutions

    alt_text

  • Differentiable “if blocks”

    alt_text

  • Differentiable tree operations

    alt_text

  • Continuous spatiotemporal transformers

    alt_text

Graphs

  • Compositionality via learnt pooling from a multi-view graph to a latent graph

    alt_text

  • Positional encodings to take advantage of edge directions

    alt_text

alt_text

Adversarial attacks

  • Independent component analysis to design an attack on federated learning
    alt_text

alt_text

Curiosities

alt_text

  • Implicit neural representations (using spatial coordinates C or environmental features E or both) to predict presence of wildlife species.
    alt_text

alt_text

  • ML on Mars for source separation to detect marsquakes!

    alt_text

  • Template + score/filter prompts for a dataset without access to labels.

    alt_text

  • A simple initialization trick for VIT-Tiny

    alt_text

  • How to fine-tune ML models in an “open-source” fashion: fine-tune in parallel and then merge

    alt_text

Rishabh Kabra
Rishabh Kabra
Research Engineer and PhD Student

The revolution will not be supervised. But it will be somewhat structured.