Distributed Training on AI Tech Blog

Distributed Training on AI Tech Blog https://jesamkim.github.io/ai-tech-blog/tags/distributed-training/ Recent content in Distributed Training on AI Tech Blog Hugo -- 0.147.6 ko Wed, 15 Apr 2026 13:00:00 +0900 분산학습의 이해 Part 4 - Tensor/Hybrid Parallelism과 MoE https://jesamkim.github.io/ai-tech-blog/posts/2026-04-16-distributed-training-part4-tensor-hybrid-moe/ Wed, 15 Apr 2026 13:00:00 +0900 https://jesamkim.github.io/ai-tech-blog/posts/2026-04-16-distributed-training-part4-tensor-hybrid-moe/ Tensor Parallelism의 Row/Column Split 원리, Megatron-LM의 교대 방식, 2D/3D Hybrid Parallelism 조합 전략, 그리고 MoE와 Expert Parallelism까지 정리합니다. 4대 병렬화 기법의 종합 비교와 의사결정 가이드를 제공합니다. 분산학습의 이해 Part 3 - Pipeline Parallelism: GPipe에서 Zero Bubble까지 https://jesamkim.github.io/ai-tech-blog/posts/2026-04-16-pipeline-parallelism-evolution/ Wed, 15 Apr 2026 12:00:00 +0900 https://jesamkim.github.io/ai-tech-blog/posts/2026-04-16-pipeline-parallelism-evolution/ Pipeline Parallelism의 진화를 추적합니다. Naive Pipeline의 낮은 GPU 활용률에서 시작해, GPipe의 micro-batch, 1F1B의 교차 실행, ZBH의 Backprop 분리까지 bubble을 줄여온 과정을 분석합니다. 분산학습의 이해 Part 2 - Data Parallelism: 데이터를 나눠 메모리를 줄이다 https://jesamkim.github.io/ai-tech-blog/posts/2026-04-16-data-parallelism-deep-dive/ Wed, 15 Apr 2026 11:00:00 +0900 https://jesamkim.github.io/ai-tech-blog/posts/2026-04-16-data-parallelism-deep-dive/ Parameter Server 아키텍처의 동작 원리, 학습 4단계, Centralized Training과의 수학적 동치성, 메모리 분석, 그리고 DP의 근본적 한계를 분석합니다. ResNet-18 ImageNet 예시로 실제 메모리 절감 효과를 계산합니다. 분산학습의 이해 Part 1 - GPU 메모리 분석: Parameter vs Activation https://jesamkim.github.io/ai-tech-blog/posts/2026-04-16-distributed-training-memory-analysis/ Wed, 15 Apr 2026 10:00:00 +0900 https://jesamkim.github.io/ai-tech-blog/posts/2026-04-16-distributed-training-memory-analysis/ Neural Network 학습 루프의 각 단계에서 GPU 메모리가 어떻게 소비되는지 분석합니다. SGD부터 Adam까지 optimizer별 메모리 수식, activation memory의 batch size 비례 관계, 그리고 OOM 대응 전략까지 정리합니다.