$ conda create -n ffcv python=3.9 cupy pkg-config libjpeg-turbo opencv pytorch torchvision cudatoolkit=11.6 numba -c conda-forge -c pytorch && conda activate ffcv && conda update ffmpeg && pip install ffcv
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
train_ds = datasets.ImageFolder('/pth/to/data',
transform=transforms.Compose([
transforms.ToTensor(),
transforms.RandomResizedCrop(),
transforms.RandomHorizontalFlip(p=0.5),
transforms.Normalize(MEAN, STDEV)
])
train_loader = DataLoader(train_ds,
shuffle=True,
batch_size=512,
num_workers=8)
for ims, labs in train_loader:
ims = ims.half()
.cuda(non_blocking=True)
.to(memory_format=ch.channels_last)
# Model training...
from ffcv.loader import Loader, OrderOption
from ffcv.fields.decoders import \
RandomResizedCropRGBImageDecoder
from ffcv.transforms import *
import torchvision as tv
train_loader = Loader('/pth/to/data.beton', batch_size=512,
num_workers=8, order=OrderOption.RANDOM,
pipelines={'image': [
RandomResizedCropRGBImageDecoder((224, 224)),
ToTensor(),
# Move to GPU asynchronously as uint8:
ToDevice(ch.device('cuda:0')),
# Automatically channels-last:
ToTorchImage(),
Convert(ch.float16),
# Standard torchvision transforms still work!
tv.transforms.Normalize(MEAN, STDEV)
]})
# Prefetching, caching, move to GPU, all handled!
for ims, labs in train_loader:
# Model training (FAST!)
FFCV doesn't require you to change any training code: make training faster by just replacing the data loading and augmenattion pipeline.
Thanks to fully asynchronous thread-based data loading, you can now interleave training multiple models on the same GPU efficiently, without any data overhead.
FFCV allows you to shift compute load between GPU, CPU, disk, and memory to eliminate bottlenecks under almost any resource constraint.
This isn't just about fast data loading: FFCV automatically fuses and compiles the data processing pipeline into machine code. Users can build their own compiled data transformations through a simple Python API, or just continue using standard PyTorch data transformations.
Everything about FFCV is optimized: it carefully handles the caching, preloading, threading, scheduling, compilation, etc. so that you don't have to. The numbers speak for themselves.
FFCV comes with continually updating documentation that includes a variety of example use cases. The projects maintainers can also be reached through an FFCV Slack workspace.