Train machine learning models fast.

                    
$ conda create -n ffcv python=3.9 cupy pkg-config libjpeg-turbo opencv pytorch torchvision cudatoolkit=11.6 numba -c conda-forge -c pytorch && conda activate ffcv && conda update ffmpeg && pip install ffcv
                    
                
See the code Read the docs Get support

Keep your training code intact

Drop-in replacement for existing loaders
            
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

train_ds = datasets.ImageFolder('/pth/to/data', 
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.RandomResizedCrop(),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.Normalize(MEAN, STDEV)
])

train_loader = DataLoader(train_ds, 
                          shuffle=True, 
                          batch_size=512, 
                          num_workers=8)

                          
for ims, labs in train_loader:
    ims = ims.half()
             .cuda(non_blocking=True)
             .to(memory_format=ch.channels_last)
    # Model training...
            
        
            
from ffcv.loader import Loader, OrderOption
from ffcv.fields.decoders import \
    RandomResizedCropRGBImageDecoder
from ffcv.transforms import *
import torchvision as tv

train_loader = Loader('/pth/to/data.beton', batch_size=512, 
    num_workers=8, order=OrderOption.RANDOM,
    pipelines={'image': [
            RandomResizedCropRGBImageDecoder((224, 224)),
            ToTensor(), 
            # Move to GPU asynchronously as uint8:
            ToDevice(ch.device('cuda:0')), 
            # Automatically channels-last:
            ToTorchImage(), 
            Convert(ch.float16), 
            # Standard torchvision transforms still work!
            tv.transforms.Normalize(MEAN, STDEV)
        ]})

# Prefetching, caching, move to GPU, all handled!
for ims, labs in train_loader:
    # Model training (FAST!)
            
            

Train ImageNet in minutes (not days)

FFCV cuts training times and comes with simple optimized code for standard datasets

Optimized for speed and usability

Card image cap
Drop-in speed

FFCV doesn't require you to change any training code: make training faster by just replacing the data loading and augmenattion pipeline.

Card image cap
More models per GPU

Thanks to fully asynchronous thread-based data loading, you can now interleave training multiple models on the same GPU efficiently, without any data overhead.

Card image cap
Remove bottlenecks

FFCV allows you to shift compute load between GPU, CPU, disk, and memory to eliminate bottlenecks under almost any resource constraint.

Card image cap
Custom (fast) pipelines

This isn't just about fast data loading: FFCV automatically fuses and compiles the data processing pipeline into machine code. Users can build their own compiled data transformations through a simple Python API, or just continue using standard PyTorch data transformations.

Card image cap
Hyper-optimized

Everything about FFCV is optimized: it carefully handles the caching, preloading, threading, scheduling, compilation, etc. so that you don't have to. The numbers speak for themselves.

Card image cap
Docs and support

FFCV comes with continually updating documentation that includes a variety of example use cases. The projects maintainers can also be reached through an FFCV Slack workspace.