Blog — The History of Intelligence Testing

Intelligence: Probably the most controversial of psychological constructs. Many wonderful and more awful things have been done in the name of intelligence. So, why do we still hang on to it? Well, it happens to be a very useful construct, and the measures thereof are highly entrenched in neuropsychology, prediction of job performance and other psychological applications. However, after nearly a century of intelligence testing, the assessments we use have hardly changed, and our conceptions of intelligence bear a remarkable resemblance to the early theories (albeit with more politically correct language). That’s not to say that the early theorists were wrong. There is a significant body of literature to demonstrate the utility of these theories across the board. But with the debates, debacles, and diatribes regarding intelligence testing over the years, we would expect some radical change to have come about to counter the traditional stance on intelligence and its measurement. In order to be able to move forward, it is important to look back to get a clearer picture of the route that intelligence testing has taken to arrive at its current form.

Most historical texts on intelligence testing use Francis Galton as a starting point. Inspired by his cousin, Charles Darwin’s perspectives on evolution and adaptation, Galton (1865) wrote his first famous article on the hereditary nature of genius and talent. Not having any method to actually assess genius, he depended on biographical information of distinguished individuals and their connection to equally prestigious family members to make his case for intelligence as a heritable construct. This paper spurred action into two general directions: Firstly, the measurement of genius and talent; and secondly, the development of the eugenics movement. (While a full discussion on eugenics is beyond the scope of this article, it is noteworthy that intelligence was conceptualised largely with regard to social behaviour and social standing at this time.) Galton established a laboratory in England in 1882 for the measurement of individual differences. A few years earlier, in 1879, Wilhelm Wundt had established his laboratory for experimental psychology in Leipzig, Germany. James McKeen Cattell was an American student of Wundt’s, and established the first testing centre in the United States. He also coined the term “mental test” to describe measures of sensory perception and reaction time, as these constructs were used as the first measures of cognitive ability. Many of the assessments used in these centres were included in later intelligence tests, and some still stand as measures of psychomotor ability today.

Charles Spearman worked towards developing a theory of intelligence that would inform the methods of its assessment. He published his seminal work on general intelligence in 1904, where he delineated his two-factor theory based on observed correlations between sensory discrimination and academic performance. Spearman proposed that performance on ability tasks was composed of two elements: general intellectual ability (g) and components of ability specific to the task (s). Spearman (1904) felt strongly that conducting assessments without an underlying theory was a fruitless enterprise. He stated: “The results of all good experimental work will live, but as yet most of them are like hieroglyphics awaiting their deciphering Rosetta Stone” (p. 204). In contrast to Spearman, Alfred Binet (1905) scorned the development of tests based on theory, describing them as follows: “The use of tests is today very common, and there are even contemporary authors who have made a specialty of organizing new tests according to theoretical views, but who have made no effort to patiently try them out in the schools. Theirs is an amusing occupation, comparable to a person's making a colonizing expedition into Algeria, advancing always only upon the map, without taking off his dressing gown. We place but slight confidence in the tests invented by these authors and we have borrowed nothing from them” (p. 197). Binet and Theodore Simon were commissioned by the Parisian Ministry of Public Instruction to develop an educational assessment that could determine whether a child would benefit from mainstream schooling in 1904. The purpose of the original assessment was to determine whether a child was “normal or retarded” (Binet, 1905, p. 191). The Binet-Simon Scale was developed on the basis of these experimental investigations and not theory. However, by 1908, the Binet-Simon Scale has received severe criticism for its reliance on verbal content and its single-user administration.

Henry H. Goddard, who translated the Binet scales into English, was one such critic. Goddard is best known for his work at the Vineland Training School for Feeble-Minded Girls and Boys. He also introduced the classification system for categories of increasing impairment according to IQ score (notably: moron, imbecile, and idiot). The criticism of the Binet scales resulted in the first group-administered test of ability developed by William Pyle in 1913. That same year, Seguin, Goddard, and Sylvester addressed the literacy problem by developing the first widely used non-verbal assessment of intelligence called the Seguin Form Board. In 1916, Terman of Stanford University adapted and standardised the Binet-Simon Scale into the Stanford-Binet Scale for use in the United States.

During World War I, the Army Alpha and Beta Tests were developed for the selection of recruits for the US Army by Robert Yerkes (president of the APA), and Arthur Otis (famous for his group tests). These tests were the first widely distributed assessment that was administrable to groups and that made use of non-verbal items to measure intelligence.

John C. Raven furthered non-verbal assessment with the development of the Raven’s Progressive Matrices (RPM) in 1936. Raven worked on the genetic and environmental origins of mental defect with Lionel Penrose. Raven was a student of Spearman’s, and he set out to develop a measure of cognitive ability that was theoretically based and less cumbersome than those being used at the time. Raven (1939) based the RPM on the measurement of eductive ability (the ability to make meaning from abstract concepts), and it is generally paired with the Mill Hill Vocabulary Scale, which measures reproductive ability (the ability to repeat information and learned skills). The RPM is essentially the first assessment to have been developed using the principles of item response theory: The order of item difficulties must be the same for individuals of all levels of ability.

David Wechsler published the Wechsler-Bellevue Intelligence Scales (WB) in 1939. He was the chief psychologist at the Bellevue Psychiatric Hospital, which is where the WB got its name from. The WB was the first intelligence assessment to use the deviation IQ scale (mean of 100, SD of 15) instead of mental age calculations. It was also the first to award credit for all items in each scale, not just for passing a specific number of tasks. The WB was also the first intelligence test to have a verbal and performance scale. The scales are updated every 10-15 years to ensure that they remain relevant and to counter the Flynn effect. The RPM has had a similar impact in the group assessment of cognitive ability. Many of the subtests were borrowed from existing tests – for a full description of where these come from, see Boake (2002). The Wechsler Intelligence Scales have subsequently dominated the individual assessment of intelligence across the world. With versions for adults and children, memory and achievement, most of the research done in the area of intelligence has been done using the Wechsler scales. Some minor changes in the assessment of cognitive ability came about in the years after the RPM and Wechsler scales were published. The assessment of separate abilities such as verbal, numerical, and abstract reasoning emerged from the educational field as a result of Raymond Cattell and John Horn’s work on fluid and crystallised intelligence in the 1960’s. Theories on multiple intelligences (Gardner, 1985) have also resulted in a plethora of assessments that tap into the specific abilities portion of intelligence measurement. Nowadays, we also have measures of cognitive or intellectual styles, levels of complexity, emotional intelligence, and many other types of intelligence. However, none of these has completely revolutionised our perspective on the actual definition of intelligence. It will be interesting to see whether new technologies bring new ways of thinking about this prickly construct, and bring new methods of tapping into the human psyche.

The History of Intelligence Testing

Suggested reading and references

Newsletter

Catalogue

Latest Blog Posts

Learning & Development

Company

Resources

Connect with us