Treats hybrid layer selection as a budget-constrained subset optimization and introduces FlashMorph: a pipeline that equips each transformer layer with a linear-attention branch, jointly optimizes layerwise gates on synthetic long-context retrieval data, then discretizes, distills, and finetunes—achieving strong long-context recall using only 20M selection tokens.