Publications

bibimg

Zhao, B. (2022). Data-efficient neural network training with dataset condensation [PhD thesis]. University of Edinburgh.

The state of the art in many data driven fields including computer vision and natural language processing typically relies on training larger models on bigger data. It is reported by OpenAI that the computational cost to achieve the state of the art doubles every 3.4 months in the deep learning era. In contrast, the GPU computation power doubles every 21.4 months, which is significantly slower. Thus, advancing deep learning performance by consuming more hardware resources is not sustainable. How to reduce the training cost while preserving the generalization performance is a long standing goal in machine learning. This thesis investigates a largely under-explored while promising solution - dataset condensation which aims to condense a large training set into a small set of informative synthetic samples for training deep models and achieve close performance to models trained on the original dataset. In this thesis, we investigate how to condense image datasets for classification tasks. We propose three methods for image dataset condensation. Our methods can be applied to condense other kinds of datasets for different learning tasks, such as text data, graph data and medical images, and we discuss it in Section 6.1.

@phdthesis{Zhao22, title = {Data-efficient neural network training with dataset condensation}, author = {Zhao, Bo}, school = {University of Edinburgh}, year = {2022}, xpdf = {https://era.ed.ac.uk/handle/1842/39756} }

bibimg

Mariotti, O., Mac Aodha, O., & Bilen, H. (2022). ViewNeRF: Unsupervised Viewpoint Estimation Using Category-Level Neural Radiance Fields. British Machine Vision Conference (BMVC).

We introduce ViewNeRF, a Neural Radiance Field-based viewpoint estimation method that learns to predict category-level viewpoints directly from images during training. While NeRF is usually trained with ground-truth camera poses, multiple extensions have been proposed to reduce the need for this expensive supervision. Nonetheless, most of these methods still struggle in complex settings with large camera movements, and are restricted to single scenes, i.e. they cannot be trained on a collection of scenes depicting the same object category. To address this issue, our method uses an analysis by synthesis approach, combining a conditional NeRF with a viewpoint predictor and a scene encoder in order to produce self-supervised reconstructions for whole object categories. Rather than focusing on high fidelity reconstruction, we target efficient and accurate viewpoint prediction in complex scenarios, e.g. 360 degree rotation on real data. Our model shows competitive results on synthetic and real datasets, both for single scenes and multi-instance collections.

@inproceedings{Mariotti22, title = {ViewNeRF: Unsupervised Viewpoint Estimation Using Category-Level Neural Radiance Fields}, author = {Mariotti, Octave and Mac~Aodha, Oisin and Bilen, Hakan}, booktitle = {British Machine Vision Conference (BMVC)}, year = {2022}, xcode = {https://github.com/omariott/viewnerf} }

bibimg

Yang, Y., Cheng, X., Liu, C., Bilen, H., & Ji, X. (2022). Distilling Representations from GAN Generator via Squeeze and Span. Neural Information Processing Systems (NeurIPS).

In recent years, generative adversarial networks (GANs) have been an actively studied topic and shown to successfully produce high-quality realistic images in various domains. The controllable synthesis ability of GAN generators suggests that they maintain informative, disentangled, and explainable image representations, but leveraging and transferring their representations to downstream tasks is largely unexplored. In this paper, we propose to distill knowledge from GAN generators by squeezing and spanning their representations. We squeeze the generator features into representations that are invariant to semantic-preserving transformations through a network before they are distilled into the student network. We span the distilled representation of the synthetic domain to the real domain by also using real training data to remedy the mode collapse of GANs and boost the student network performance in a real domain. Experiments justify the efficacy of our method and reveal its great significance in self-supervised representation learning. Code will be made public.

@inproceedings{Yang22b, title = {Distilling Representations from GAN Generator via Squeeze and Span}, author = {Yang, Yu and Cheng, Xaiotian and Liu, Chang and Bilen, Hakan and Ji, Xiangyang}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2022} }

bibimg

Li, W.-H. (2022). Learning universal representations across tasks and domains [PhD thesis]. University of Edinburgh.

A longstanding goal in computer vision research is to produce broad and general-purpose systems that work well on a broad range of vision problems and are capable of learning concepts only from few labelled samples. In contrast, existing models are limited to work only in specific tasks or domains (datasets), e.g., a semantic segmentation model for indoor images (Silberman et al., 2012). In addition, they are data inefficient and require large labelled dataset for each task or domain. While there has been works proposed for domain/task-agnostic representations by either loss balancing strategies or architecture design, it remains a challenging problem on optimizing such universal representation network. This thesis focuses on addressing the challenges of learning universal representations that generalize well over multiple tasks (e.g. segmentation, depth estimation) or various visual domains (e.g. image object classification, image action classification). In addition, the thesis also shows that these representations can be learned from partial supervision and transferred and adopted to previously unseen tasks/domains in data-efficient manner.

@phdthesis{Li22b, title = {Learning universal representations across tasks and domains}, author = {Li, Wei-Hong}, school = {University of Edinburgh}, year = {2022}, xpdf = {https://hdl.handle.net/1842/39625} }

bibimg

Chen, Y., Fernando, B., Bilen, H., Nießner, M., & Gavves, E. (2022). 3D Equivariant Graph Implicit Functions. European Conference on Computer Vision (ECCV).

In recent years, neural implicit representations have made remarkable progress in modeling of 3D shapes with arbitrary topology.In this work, we address two key limitations of such representations, in failing to capture local 3D geometric fine details, and to learn from and generalize to shapes with unseen 3D transformations. To this end, we introduce a novel family of graph implicit functions with equivariant layers that facilitates modeling fine local details and guaranteed robustness to various groups of geometric transformations, through local k-NN graph embeddings with sparse point set observations at multiple resolutions.Our method improves over the existing rotation-equivariant implicit function from 0.69 to 0.89 (IoU) on the ShapeNet reconstruction task. We also show that our equivariant implicit function can be extended to other types of similarity transformations and generalizes to unseen translations and scaling.

@inproceedings{Chen22, title = {3D Equivariant Graph Implicit Functions}, author = {Chen, Yunlu and Fernando, Basura and Bilen, Hakan and Nie{\ss}ner, Matthias and Gavves, Efstratios}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2022} }

bibimg

Goel, A., Fernando, B., Keller, F., & Bilen, H. (2022). Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding. Existing SGG methods trained on the entire set of relations fail to acquire complex reasoning about visual and textual correlations due to various biases in training data. Learning on trivial relations that indicate generic spatial configuration like ’on’ instead of informative relations such as ’parked on’ does not enforce this complex reasoning, harming generalization. To address this problem, we propose a novel framework for SGG training that exploits relation labels based on their informativeness. Our model-agnostic training procedure imputes missing informative relations for less informative samples in the training data and trains a SGG model on the imputed labels along with existing annotations. We show that this approach can successfully be used in conjunction with state-of-the-art SGG methods and improves their performance significantly in multiple metrics on the standard Visual Genome benchmark. Furthermore, we obtain considerable improvements for unseen triplets in a more challenging zero-shot setting.

@inproceedings{Goel22, title = {Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation}, author = {Goel, Arushi and Fernando, Basura and Keller, Frank and Bilen, Hakan}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2022} }

bibimg

Li, W.-H., Liu, X., & Bilen, H. (2022). Learning Multiple Dense Prediction Tasks from Partially Annotated Data. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Despite the recent advances in multi-task learning of dense prediction problems, most methods rely on expensive labelled datasets. In this paper, we present a label efficient approach and look at jointly learning of multiple dense prediction tasks on partially annotated data, which we call multi-task partially-supervised learning. We propose a multi-task training procedure that successfully leverages task relations to supervise its multi-task learning when data is partially annotated. In particular, we learn to map each task pair to a joint pairwise task-space which enables sharing information between them in a computationally efficient way through another network conditioned on task pairs, and avoids learning trivial cross-task relations by retaining high-level information about the input image. We rigorously demonstrate that our proposed method effectively exploits the images with unlabelled tasks and outperforms existing semi-supervised learning approaches and related methods on three standard benchmarks.

@inproceedings{Li22, title = {Learning Multiple Dense Prediction Tasks from Partially Annotated Data}, author = {Li, Wei-Hong and Liu, Xialei and Bilen, Hakan}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2022} }

bibimg

Li, W.-H., Liu, X., & Bilen, H. (2022). Cross-domain Few-shot Learning with Task-specific Adapters. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

In this paper, we look at the problem of cross-domain few-shot classification that aims to learn a classifier from previously unseen classes and domains with few labeled samples. We study several strategies including various adapter topologies and operations in terms of their performance and efficiency that can be easily attached to existing methods with different meta-training strategies and adapt them for a given task during meta-test phase. We show that parametric adapters attached to convolutional layers with residual connections performs the best, and significantly improves the performance of the state-of-the-art models in the Meta-Dataset benchmark with minor additional cost.

@inproceedings{Li22a, title = {Cross-domain Few-shot Learning with Task-specific Adapters}, author = {Li, Wei-Hong and Liu, Xialei and Bilen, Hakan}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2022}, xcode = {https://github.com/VICO-UoE/URL} }

bibimg

Wang, K., Zhao, B., Peng, X., Zhu, Z., Yang, S., Wang, S., Huang, G., Bilen, H., Wang, X., & You, Y. (2022). CAFE: Learning to Condense Dataset by Aligning Features. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Dataset condensation aims at reducing the network training effort through condensing a cumbersome training set into a compact synthetic one. State-of-the-art approaches largely rely on learning the synthetic data by matching the gradients between the real and synthetic data batches. Despite the intuitive motivation and promising results, such gradient-based methods, by nature, easily overfit to a biased set of samples that produce dominant gradients, and thus lack a global supervision of data distribution. In this paper, we propose a novel scheme to Condense dataset by Aligning FEatures (CAFE), which explicitly attempts to preserve the real-feature distribution as well as the discriminant power of the resulting synthetic set, lending itself to strong generalization capability to various architectures. At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales, while accounting for the classification of real samples. Our scheme is further backed up by a novel dynamic bi-level optimization, which adaptively adjusts parameter updates to prevent over-/under-fitting. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art: on the SVHN dataset, for example, the performance gain is up to 11%. Extensive experiments and analysis verify the effectiveness and necessity of proposed designs

@inproceedings{Wang22, title = {CAFE: Learning to Condense Dataset by Aligning Features}, author = {Wang, Kai and Zhao, Bo and Peng, Xiangyu and Zhu, Zheng and Yang, Shuo and Wang, Shuo and Huang, Guan and Bilen, Hakan and Wang, Xinchao and You, Yang}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2022} }

bibimg

Deecke, L. (2022). Image Classification over Unknown and Anomalous Domains [PhD thesis]. University of Edinburgh.

A longstanding goal in computer vision research is to produce broad and general-purpose systems that work well on a broad range of vision problems and are capable of learning concepts only from few labelled samples. In contrast, existing models are limited to work only in specific tasks or domains (datasets), e.g., a semantic segmentation model for indoor images (Silberman et al., 2012). In addition, they are data inefficient and require large labelled dataset for each task or domain. While there has been works proposed for domain/task-agnostic representations by either loss balancing strategies or architecture design, it remains a challenging problem on optimizing such universal representation network. This thesis focuses on addressing the challenges of learning universal representations that generalize well over multiple tasks (e.g. segmentation, depth estimation) or various visual domains (e.g. image object classification, image action classification). In addition, the thesis also shows that these representations can be learned from partial supervision and transferred and adopted to previously unseen tasks/domains in data-efficient manner.

@phdthesis{Deecke22a, title = {Image Classification over Unknown and Anomalous Domains}, author = {Deecke, Lucas}, school = {University of Edinburgh}, year = {2022}, xpdf = {http://dx.doi.org/10.7488/era/2019} }

bibimg

Deecke, L., Hospedales, T., & Bilen, H. (2022). Visual Representation Learning over Latent Domains. International Conference on Learning Representations (ICLR).

A fundamental shortcoming of deep neural networks is their specialization to a single task and domain. While multi-domain learning enables the learning of compact models that span multiple visual domains, these rely on the presence of domain labels, in turn requiring laborious curation of datasets. This paper proposes a less explored, but highly realistic new setting called latent domain learning: learning over data from different domains, without access to domain annotations. Experiments show that this setting is particularly challenging for standard models and existing multi-domain approaches, calling for new customized solutions: a sparse adaptation strategy is formulated which adaptively accounts for latent domains in data, and significantly enhances learning in such settings. Our method can be paired seamlessly with existing models, and boosts performance in conceptually related tasks, eg empirical fairness problems and long-tailed recognition.

@inproceedings{Deecke22, title = {Visual Representation Learning over Latent Domains}, author = {Deecke, Lucas and Hospedales, Timothy and Bilen, Hakan}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2022}, xcode = {https://github.com/VICO-UoE/LatentDomainLearning} }

bibimg

Yang, Y., Cheng, X., Bilen, H., & Ji, X. (2022). Learning to Annotate Part Segmentation with Gradient Matching. International Conference on Learning Representations (ICLR).

The success of state-of-the-art deep neural networks heavily relies on the presence of large-scale labeled datasets, which are extremely expensive and time-consuming to annotate. This paper focuses on reducing the annotation cost of part segmentation by generating high-quality images with a pre-trained GAN and labeling the generated images with an automatic annotator. In particular, we formulate the annotator learning as the following learning-to-learn problem. Given a pre-trained GAN, the annotator learns to label object parts in a set of randomly generated images such that a part segmentation model trained on these synthetic images with automatic labels obtains superior performance evaluated on a small set of manually labeled images. We show that this nested-loop optimization problem can be reduced to a simple gradient matching problem, which is further efficiently solved with an iterative algorithm. As our method suits the semi-supervised learning setting, we evaluate our method on semi-supervised part segmentation tasks. Experiments demonstrate that our method significantly outperforms other semi-supervised competitors, especially when the amount of labeled examples is limited.

@inproceedings{Yang22a, title = {Learning to Annotate Part Segmentation with Gradient Matching}, author = {Yang, Yu and Cheng, Xiaotian and Bilen, Hakan and Ji, Xiangyang}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2022} }

bibimg

Yang, Y., Bilen, H., Zou, Q., Cheung, W. Y., & Ji, X. (2022). Unsupervised Foreground-Background Segmentation with Equivariant Layered GANs. IEEE Winter Conference on Applications of Computer Vision (WACV).

We propose an unsupervised foreground-background segmentation method via training a segmentation network on the synthetic pseudo segmentation dataset generated from GANs, which are trained from a collection of images without annotations to explicitly disentangle foreground and background. To efficiently generate foreground and background layers and overlay them to compose novel images, the construction of such GANs is fulfilled by our proposed Equivariant Layered GAN, whose improvement, compared to the precedented layered GAN, is embodied in the following two aspects. (1) The disentanglement of foreground and background is improved by extending the previous perturbation strategy and introducing private code recovery that reconstructs the private code of foreground from the composite image. (2) The latent space of the layered GANs is regularized by minimizing our proposed equivariance loss, resulting in interpretable latent codes and better disentanglement of foreground and background. Our methods are evaluated on unsupervised object segmentation datasets including Caltech-UCSD Birds and LSUN Car, achieving state-of-the-art performance.

@inproceedings{Yang22, title = {Unsupervised Foreground-Background Segmentation with Equivariant Layered GANs}, author = {Yang, Yu and Bilen, Hakan and Zou, Qiran and Cheung, Wing Yin and Ji, Xiangyang}, booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)}, year = {2022} }

Publications

2025

2024

2023

2022

2021

2020

2019