Cheat Sheet

Neural Network Architecture Selection Cheat Sheet

One-page quick reference for practitioners. Print it, bookmark it, use it when making architecture decisions.

Neural Networks Quick Reference January 02, 2026 3 min read perfecXion Team

The Core Principle

Inductive Bias Match
Inductive Bias Match

Architecture = Inductive Bias. Match your architecture's assumptions to your data's structure.

Data Structure Architecture Assumes Use Watch For
Grid/Spatial (images) Nearby elements correlate CNN Misses global context
Sequential (text, time) Order matters Transformer/RNN Cost/latency explosion
Relational (networks) Explicit relationships GNN Graph construction errors
Tabular (spreadsheets) Minimal structure Trees → MLP High data requirements
Multiple types Separate encoders needed Multimodal Complexity without gain

Quick Decision by Modality

Quick Decision by Modality
Quick Decision by Modality

Vision

Text

Time Series

Graph

Multimodal

Data Quantity Rules of Thumb

Samples Approach
< 1,000 Classical ML. Heavy transfer learning if neural.
1,000–10,000 Transfer learning essential. Fine-tune pretrained.
10,000–100,000 Most architectures viable with pretrained start.
100,000+ Training from scratch becomes reasonable.

The Five Principles

Five Principles Cards
Five Principles Cards
  1. Simple First — Try logistic regression or gradient boosting before neural networks.
  2. Transfer Learning Default — Never train from scratch if pretrained weights exist.
  3. Data Over Architecture — The best architecture can't fix bad data. Spend 80% on data quality.
  4. Match Inductive Bias — Choose architectures whose assumptions match your data's true structure.
  5. Production Reality — Consider latency, memory, and monitoring from the start.

Before You Start: Model Brief

Answer these before choosing:

Model Brief Checklist
Model Brief Checklist

Input: Fixed or variable size? Local or global signal?

Output: Label, sequence, mask, ranking, or generation?

Constraints: Latency requirement? Memory budget? Throughput needs?

Risk: Explainability required? False negative vs false positive tolerance?

Common Mistakes

Mistake Reality
"Transformers are always best" Wasteful for tabular, small vision, edge
Ignoring classical ML for tabular XGBoost/LightGBM often wins
Training from scratch Fine-tuning needs 100x less data
Deeper = better Diminishing returns, overfitting risk
Adding modalities "because it might help" Complexity without signal = noise

Production Checklist

Production Checklist
Production Checklist

Optimization Options

Technique Result
Quantization (float32 → int8) 4x smaller, 2-4x faster
Pruning Remove near-zero weights
Distillation Small student mimics large teacher

Full Guide

For comprehensive coverage with worked examples and deep-dives into each architecture family:

perfecXion.ai