The requirements re failure to flare are in CS-AWO and CS-25 (.1329?), but these are not presented as a simple probability.
Basically any failure the overall system should not result in a fatality with a probability something like 10^-9.
These are only numbers, theory, whereas in practice the auto technical reliability is very good, and much better than the human for the majority of landing tasks in the same conditions.
Flare is usually a sub mode of the flight guidance system. In modern high integrity systems lack of mode change or sensor failure can be detected and used to disengage the autopilot or revert to a lower level of operation; the latter being somewhat irrelevant at low altitude - fail op reverts to fail passive, but the aircraft still lands.
In addition many systems use a trim up function which biases the aircraft to pitch nose up if the autopilot disengages at low altitude. There are many reasons for such a system, control torque, minimising GA height loss, and accident severity - probability of fatality.
The need for a 'no flare' call may have been carried over from older systems - simplex monitored or some duplex autopilots with a 'bolt-on' flare mode. Modern systems should not need a call. How would a failure be detected and how could the pilot intervene in time. Older systems suffered more 'failures' due to unwarranted human intervention than there were actual failures.
Those operators who still require a call might consider the type of system they are using, the expectations of crews ability to monitor, and how they might react. If procedures require a call for a mode change, what happens when it doesn't happen. Is the mode change proportional to rad alt, if so why not call altitude - auto call outs do that any way.
And those who use HUD might wish to make a similar evaluation and be prepared to argue the differences between autoflight failure to flare vs human performance in the same conditions; why would one be better than the other.....
The industry should focus more on failures like the AMS 737 accident.
Accident classifications range from CFIT to human error, but it was a component of the auto system which had the 'failure'.
Although this is not a feature of modern designs, excepting for 'grandfather rights' older 737s, the industry still has to be vigilant for those problems not yet identified, operationally or in certification (CRJ accident Sweden, enhanced vision incident Australia).