mm43, it's more likely they'd be using noise reduction technologies than trying to mix the signal to baseband or 2f as you described. I'd try using auto-correlation techniques on the signal and a delayed version of the signal, perhaps two separate pings.
Within a short time the box would be at thermal equilibrium with its resting point. So from ping to ping it's frequency would be quite stable. But the noise environment would be ever changing. So auto-correlation could use two versions of the signal to pick itself up out of the noise. I'd also look for short term correlations in the noise that could be used for nulling portions of the noise. Other techniques such as really narrow band FFT and the like may also help bring the ping up out of the noise environment.
{^_^}