CRM is generally assessed according to very carefully defined criteria. NOTECHS is one example, developed within Europe to assess non-technical skills as objectively as possible. The University of Texas developed its own set of criteria, and various Airlines use a variation of these techniques.
In a nutshell, non-technical skills are divided into categories: eg Leadership and management, situational awareness, decision-making and cooperation. These are then subdivided into elements, each of which will have examples of desirable and undesirable behaviours. These are known as "behavioual markers" For example: poor practice - "Does not intervene in case of deviations" , good practice: "intervenes if task completion deviates from standards". There is then a need to set the pass/fail standard - do you let someone pass whose behaviour has not resulted in an unsafe situation on this occasion, but potentially could? Or do you need to have evidence of a technical failure as the outcome of the CRM problem in order to fail a line check? Dfferent operators will have different views on this.
Hope this explains the methodolgy a bit - its a big subject!