For Startups, there are endless possibilities for business opportunities building small, specific AI models that perform much better on specific tasks, when trained on exclusively access datasets, with an efficient AI model architecture. If you have this opportunity, you might train your own model from scratch (e.g., Computer Vision, recommendation engines, scientific AI) - btw, that's we do at NeuraMancer.ai:
- Just use PyTorch to implement it. It's the industry and research standard
- The entire ecosystem (data loaders, optimizers, distributed training) is built around it
- Use Triton kernels and/or
torch.compile() for unparalleled speed!
- Train in BF8 (Hopper) or even FP4 natively (Blackwell)
- Training is a different workload than LLM inference. It is compute-bound, so raw TFLOPS (Tensor Cores) matter more -> rent a training cluster if you lack the capital