I do not really see where the big difference is. We already get boxes around new modes, and any correctly trained pilot knows the implications of those boxes instantly. The effort should be on training, rather than even more visual cues in a system that already relies too heavily on one sensor channel of those three that are available to human beings (visual, audio and tactile).
As mentioned above, the example with the deactivated autothrust is rather interesting, the speed decay, the eyes are drawn to the new boxes without any prompting for the decaying speed. This is rather doing the opposite of what it should do if we have to direct the scan to start with.