FFCV

            
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

train_ds = datasets.ImageFolder('/pth/to/data', 
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.RandomResizedCrop(),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.Normalize(MEAN, STDEV)
])

train_loader = DataLoader(train_ds, 
                          shuffle=True, 
                          batch_size=512, 
                          num_workers=8)

                          
for ims, labs in train_loader:
    ims = ims.half()
             .cuda(non_blocking=True)
             .to(memory_format=ch.channels_last)
    # Model training...

            
from ffcv.loader import Loader, OrderOption
from ffcv.fields.decoders import \
    RandomResizedCropRGBImageDecoder
from ffcv.transforms import *
import torchvision as tv

train_loader = Loader('/pth/to/data.beton', batch_size=512, 
    num_workers=8, order=OrderOption.RANDOM,
    pipelines={'image': [
            RandomResizedCropRGBImageDecoder((224, 224)),
            ToTensor(), 
            # Move to GPU asynchronously as uint8:
            ToDevice(ch.device('cuda:0')), 
            # Automatically channels-last:
            ToTorchImage(), 
            Convert(ch.float16), 
            # Standard torchvision transforms still work!
            tv.transforms.Normalize(MEAN, STDEV)
        ]})

# Prefetching, caching, move to GPU, all handled!
for ims, labs in train_loader:
    # Model training (FAST!)

Drop-in speed

FFCV doesn't require you to change any training code: make training faster by just replacing the data loading and augmenattion pipeline.

More models per GPU

Thanks to fully asynchronous thread-based data loading, you can now interleave training multiple models on the same GPU efficiently, without any data overhead.

Remove bottlenecks

FFCV allows you to shift compute load between GPU, CPU, disk, and memory to eliminate bottlenecks under almost any resource constraint.

Custom (fast) pipelines

This isn't just about fast data loading: FFCV automatically fuses and compiles the data processing pipeline into machine code. Users can build their own compiled data transformations through a simple Python API, or just continue using standard PyTorch data transformations.

Hyper-optimized

Everything about FFCV is optimized: it carefully handles the caching, preloading, threading, scheduling, compilation, etc. so that you don't have to. The numbers speak for themselves.

Docs and support

FFCV comes with continually updating documentation that includes a variety of example use cases. The projects maintainers can also be reached through an FFCV Slack workspace.

Train machine learning models fast.

Keep your training code intact

Drop-in replacement for existing loaders

Train ImageNet in minutes (not days)