This paper introduces an attention-based encoder–decoder NMT architecture that learns soft alignments between source and target words while translating, eliminating the fixed-length bottleneck of earlier seq2seq models. The approach substantially improves BLEU, especially on long sentences, and matches phrase-based SMT on English-French without additional hand-engineered features. The attention mechanism it proposes became the foundation for virtually all subsequent NMT systems and inspired attention-centric models like the Transformer, reshaping machine translation and sequence modeling across NLP.