You may be surprised. The JAR-Tel project was designed to test NOTECHS by taking 85 different pilots from different operators in different countries around Europe. They were shown a series of videos of simulator scenarios and asked to rate the crew as depicted. There was a very high level of inter-rater agreement, not just on the final outcome pass/fail, but also on the areas of concern.
I have used these same videos on CRMI(Line) workshops, and I can say the same - generally people recognize dangerous behaviour when they can see it from a detached position.
Obviously these videos are scripted, so slightly artificial, but equally, real simulator recordings could be used if you had a database of sufficiently interesting ones to choose from.