I don't think many people ever wrote this, except as an exersice.
I would locate first and last word, picked the shorter of them, swapped the chars up to the shorter's size in place, and then I'd have to shift the whole buffer (mmemove) by 1 char to free the space for the next char of the long one to be copied. Of course, a better way is to have a fixed-size buffer (say, 8 chars) so that at least some words can be copied with a single shift.
This produces slow and ugly code. In reality, I'd be sorely tempted to allocate another buffer and finish the whole thing with a single memcpy. Or am I missing an obvious and good algorithm here?