We are talking about over 10x reduction in GPU time for inferencing tokens and for training too
Aka it’s cheaper and faster
Alignment is frankly IMO purely a dataset design and training issue. And has nothing to do with the model