Skip to content

Latest commit

 

History

History
126 lines (98 loc) · 7.69 KB

README.md

File metadata and controls

126 lines (98 loc) · 7.69 KB

RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network

This is the official code of RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network. The code only supports training and evaluation on FashionIQ. We release the implementations for the other baselines together.

banner

Updates

  • (2021.10.26) Update model checkpoints, trianing configs and tensorboard logs.
  • (2021.09.10) The official code is released.

Requirements

Prepare your environment with virtualenv.

python3 -m virtualenv --python=python3 venv # create virtualenv.
. venv/bin/activate # activate environment.
pip3 install -r requirements.txt # install require packages.

Download Data

We provide script for downloading FashionIQ images. Note that it does not ensure that all images can be downloaded because we found some urls are broken.

sh script/download_fiq.sh

Model Zoo

We provide pretrained checkpoints for RTIC / RTIC-GCN trained on FashionIQ.

Model Recall Checkpoint Config Training Log
RTIC 39.22 ckpt config tensorboard_log
RTIC-GCN (scratch) 39.55 ckpt config tensorboard_log
RTIC-GCN (finetune) 40.64 ckpt config tensorboard_log

Benchmark Score on FashionIQ Dataset

Method Metric ((R@10 + R@50) / 2) Paper
JVSM 19.26 pdf
TRACE w/ BERT 34.38 pdf
VAL w/ GloVe 35.38 pdf
CIRPLANT w/ OSCAR 30.20 pdf
MAAF 36.60 pdf
CurlingNet 38.45 pdf
CoSMo 39.45 pdf
RTIC w/ GloVe 39.22 -
RTIC-GCN w/ GloVe (scratch) 39.55 -
RTIC-GCN w/ GloVe (fine-tune) 40.64 -

Quick Start

We provide sample training script to run on different configurations. The default configurations are stored in cfg/default.yaml which represents "unified environmet" in our paper. To try with "optimal environment", please use +optimize=<someting> option.

(1) RTIC (unified env)

EXPR_NAME=testrun python main.py \
    config.EXPR_NAME=${EXPR_NAME}

(2) RTIC (optimal env)

EXPR_NAME=testrun python main.py \
    +optimize=rtic \
    config.EXPR_NAME=${EXPR_NAME}

(3) RTIC-GCN (optimal env, scratch)

EXPR_NAME=testrun_gcn LOAD_FROM=testrun python main.py \
    +optimize=rtic_gcn_scratch \
    +gcn=enabled \
    config.LOAD_FROM=${LOAD_FROM} \
    config.EXPR_NAME=${EXPR_NAME}

(4) RTIC-GCN (optimal env, finetune)

EXPR_NAME=testrun_gcn LOAD_FROM=testrun python main.py \
    +optimize=rtic_gcn_finetune \
    +gcn=enabled \
    config.LOAD_FROM=${LOAD_FROM} \
    config.EXPR_NAME=${EXPR_NAME}

(5) Other Baselines

you can train any other baselines by simply changing config.TRAIN.MODEL.composer_model.name.

(w/o GCN)
EXPR_NAME=testrun python main.py \
    config.TRAIN.MODEL.composer_model.name=<any-composer-method-you-want-to-try> \
    config.EXPR_NAME=${EXPR_NAME}
(w GCN)
EXPR_NAME=testrun_gcn LOAD_FROM=testrun python main.py \
    +gcn=enabled \
    config.TRAIN.MODEL.composer_model.name=<any-composer-method-you-want-to-try> \
    config.LOAD_FROM=${LOAD_FROM} \
    config.EXPR_NAME=${EXPR_NAME}

Citation

If you find this work useful for your research, please cite our paper:

@article{shin2021rtic,
  title={RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network},
  author={Shin, Minchul and Cho, Yoonjae and Ko, Byungsoo and Gu, Geonmo},
  journal={arXiv preprint arXiv:2104.03015},
  year={2021}
}

License

MIT License