Dealing with digital data
Two letters from our May edition respond to the Cambridge Analytica scandal; plus online extras.
10 April 2018
The recent controversy around Cambridge Analytica, set up by an assistant professor from Cambridge University, illustrates that the psychological community needs to put dialogue, discussion and ultimately revision of professional frameworks centre stage. As the controversy around this particular case of mass data harvesting continues, not least due to the reported use of the data during the US presidential election and the Brexit referendum, we feel it appropriate the raise the wider issue of whether psychological research is prepared for the world of digital data.
This case raises fundamental concerns, including: (a) who actually owns digital data; (b) the boundaries between collecting research data and potential commercial exploitation; (c) issues of informed consent for existing data; (d) the capability of entities usually involved in the assessment of relevant research such as institutional ethics committees; and (f) the adequacy of professional guidelines. A recent chapter on the topic (co-written by one of us) advocates strongly that researchers and institutional ethics committees need to develop specialist skills to deal with digital data (Whiting & Pritchard, 2017). Importantly, this involves recognising that the internet is not a single unitary location and requires a highly reflexive approach to deal with resulting ethical complexities.
The current British Psychological Society guidelines on internet-mediated research address some important issues. These include the increasing blurring between what is considered a private or a public domain, as well as this issue of ownership (Is the data owner the individual? The web service provider? Or both?). In our view, there are issues which require yet more consideration, including whether data is considered primary (collected for the purpose of the research) or secondary (pre-existing). There is a wealth of pre-existing data available which can theoretically be exploited for research purposes; however, it may not be feasible or indeed possible to gain participants’ informed consent.
It is an overarching principle in all relevant Society codes of conduct that the benefits of conducting any research should outweigh any risks. Yet with digital data sources it may be difficult to make such judgement calls, not least given the inherent element of unpredictability of using ‘big data’ sources, as in the Cambridge Analytica example, that are likely to be of interest not only to researchers but also to commercial entities wishing to exploit any findings. Moreover, engaging with data on any digital platforms requires specialist skills and training, as they are ‘as complex and diverse as the human environments from which they emerge’ (Kosinski et al., 2015).
The General Data Protection Regulation, which comes into effect this month, will bring yet further changes to the law on data protection and privacy. Organisations, including research institutions, will be obliged to account for what they do with personal data, why and how they do it. Other societies and professional bodies including the Academy of Management are engaging very proactively with the ramifications of a digital economy and digital data. Isn’t it time for a Society forum and/or a special issue of The Psychologist to engage our community in this important topic? As the Nuffield Foundation announce the £5 million Ada Lovelace Institute to examine the breadth of social and ethical issues arising from digital data use, it’s time we got engaged to ensure that the psychological community embraces
a proactive stance.
Head of Department, Organizational Psychology, Birkbeck University of London
Lecturer, Department of Organizational Psychology, Birkbeck University of London
Kosinski, M., Matz, S., Gosling, S.D. & Stillwell, D. (2015). Facebook as a research tool for the social sciences. American Psychologist, 70(6), 543–556.
Whiting, R. & Pritchard, K. (2017). Digital ethics. In C. Cassell, A.L. Cunliffe & G. Grandy (Eds.) The Sage handbook of qualitative business and management research methods. Vol. 1 (pp.562–579). London: Sage.
Crossing the Rubicon
Responding to the recent Cambridge Analytica scandal, Jon Sutton raises a key question – one not yet asked in the myriad media articles I have read – about the role that psychologists play in the development and use of influence techniques. In 2008, three years post doctorate, I had the opportunity to accompany SCL (CA’s parent company) to the Far East to train a military organisation in the science of strategic communication. I was hired as a research methods expert, and was one of at least 10 Oxford- or London-trained ‘PhDs’ in the social and behavioural sciences. That course lasted for six months, and by the end of it the methodology we were teaching – the Behavioural Dynamics Institute methodology for strategic communication and behaviour change – had undergone significant development and refinement at the hands of the many scientists who were grappling with it on a daily basis.
Once the course had wrapped up, I was shipped back to London and asked to get to work on a research project about the social dynamics of insurgency in the FATA regions of Pakistan. I had joined SCL on a six-month contract and had every intention of getting back to academia, but I’ll admit, the allure of this intriguing new world found me saying ‘just another six months’. During that time, I began to ask Nigel Oakes, founder of SCL, lots of questions about the methodology and techniques we were teaching. I wanted to know where it had all come from and how it had been validated. The discoveries I made still interest me to this day.
Oakes had set up the Behavioural Dynamics Working Group at UCL in 1989 and worked with two prominent psychologists to develop the basic building blocks of the methodology. The concepts were drawn from across the psychological sciences. Throughout the 1990s and early 2000s the methodology was continually improved, expanded and tested. Very many scientists gave input to this development, including the late Professor Phil Taylor, head of the Institute of Communication Studies at Leeds University and author of Munitions of the Mind: A History of Propaganda. Nigel Oakes was fascinated by propaganda and once told me that he had the largest private library of propaganda texts in the UK.
The recent collaboration with scientists Alexsandr Kogan and Chris Wylie is just another step along SCL’s journey of perfecting the scientific application of influence and persuasion. Sutton’s article discusses the term ‘psychographics’. This is the term SCL used for applying the behavioural dynamics methods to psychologically profiling an audience to work out which buttons to press to change – or not change – their behaviour. To professional psychologists, the use of personality data may seem rather rudimentary, but that was an opportunity move led by Kogan, and then Wylie, because that’s the data they had access to.
But the methods developed by SCL go far deeper than that, and next time it will be something else. Wylie arrestingly described the convergence of an established military methodology with social media and big data as like ‘Nixon on steroids’. It’s a good analogy, because in sport the undetected use of drugs gives a powerful advantage. SCL nearly got away with it but for some dogged journalists. We should take note that catching people using steroids doesn’t stop the use of the drugs. Steroids are so well established in sport that it is impossible to go back. My concern is that these influence techniques will continue to be developed – not just by SCL, but also by many other organisations – and that we are crossing the Rubicon into an enhanced form of psychological ‘warfare’ from which there is no return.
Bringing Big Data down to size
The Cambridge Analytica furore has flooded the media. Mr Zuckerberg has appeared before Congress as unflappable as a swan, in a tight spot, in a tight suit. Global assumptions have been made. The most salient of these is that the data harvested by Cambridge Analytica are perniciously accurate and more useful than a Gigli saw in terms of seeing inside our heads. It seems that CA’s algorithms have accurately calculated our buying patterns, our voting patterns and could be used for more intimate, more incendiary calculations such as determining our choice of careers, our vices, even our partners.
I am a Psychologist. I should sit here and say 'Rubbish; of course this can’t happen.'
However, better to risk ostracism than ostrichism. Of course, it can. As a therapist, I see that unwelcome data often causes clients to carry worries like shopping bags; heavy with uncertainty, mistrust and need. Cambridge Analytica; or anybody else with the combination of acumen, access to sufficient data points and the knowledge to know what to do with them all; can certainly use big data for good or ill. Data is as amoral as it is Delphic.
There are a couple of assumptions to make here, both good ones. Let us for now trust Cambridge Analytica and their open cast data miners, scooping through our naively unprotected, online lives. Let us assume that their statistical manipulations are as apposite and whip sharp, as claimed by CA boss Alexander Nix. Certainly, CA has chosen a reasonably respected psychometric instrument against which to assess their Facebook haul. Costa and McRae's Big 5 has a nebulous 1960s genesis with a more choate version emerging over 20 years later. The tool is well established and widely understood in psychological circles. Of course, it is a pot thrown in the dark by a diligent apprentice; far from perfect but good enough to hold water. Yet the accuracy claimed by the media for the statistics is startling. According to the media coverage, in analysing just 10 Facebooks ‘likes’ the CA, Big 5 based algorithm is better at assessing you than your work colleagues. Give it 150 ‘likes’ and it knows more about you than your parents. And this is “proven”.
Yet you can prove nothing, neither a negative nor an argument, nothing. You can only prove “beyond reasonable doubt”. What lies beyond reasonable doubt, such as a watchful deity above a flat earth, tends to vary over time. And the media remains confusticated by “proof” and “evidence”. Cambridge Analytica consequently had coach, horses and the open road to York in terms of being able to take data and misrepresent them.
The evidence of the power of CA algorithms is insufficient to cover a balding statistical pate. A good personality psychometric accounts for about 16 per cent of what makes you, me, or anybody tick. That means that 84 per cent of your psychological make up – or what is called the variance – whirls past the analytical maw like lucky plankton.
The most accurate psychometrics, like the Big 5, are called trait indicators. Trait indicators show how we are different from other people. How we are, for example, more or less conscientious, or more or less extravert in comparison to others.
A little more wrangling with the numbers can produce type indicators. These show how similar we are to others. We may belong to a group of Shapers, or Explorer Promoters, or we might have a personality likened to a colour. Type indicators are, frankly, less effective tools. Nevertheless, they do give you things to talk about, especially if you lead a sheltered life and believe in horoscopes. The most popular psychometric tool in the Western World is the Myers Briggs Type Indicator. In the view of the British Psychological Society it is a tool suitable for development, but its statistical validity is too low for it to be used in recruitment.
Cambridge Analytica uses a Big 5 template that has been reduced to a series of Types for ease of reference. So, we see people painted as Adventurers, Protectors or Executives, labels that are easier to explain than, say, different levels of neuroticism. So, you can forget the 16 per cent variance. With a Type indicator this figure will be much lower.
This brings Big Data down to size. Given what we know, how likely is it that the algorithm knows more about you than your parents? If we assume that the Big 5 Type indicator accounts for about 9 per cent of what makes us tick we may be somewhere in the right numerical area. So now how valuable are our data?
Well, the Brexit vote was won by a 52:48 per cent vote share. Donald Trump won one of the closest Presidential races in history. Understanding some fraction more of your market than you do now – and a fraction more than your opponents – could hold more value than the Amazon, or Amazon.
The average job interview accounts for only 2.5 per cent of your variance (approximately). This arguably makes it a little less valid, in terms of a selection method, than sticking a pin in a list of job applicants.
Even so how much would you pay for somebody to sort through a whole field of applicants? How valuable a tool would analyse many times the amount of data you can elicit by asking a trickle of scrubbed and mendacious hopefuls why they left their last job or where do they see themselves in five years’ time? Further, wouldn’t it be good if, after analysis, you were presented with the best answers and a weighted list of the most suitable candidates?
Now quadruple the quality of your interviewing erudition amongst the same host of selected people. How much is that product worth to you now? Here, very roughly speaking, is where we stand with these analytics.
There is danger and there is opportunity behind the CA hyperbole. These psychological wranglings require psychologists and yet where are they? Psychological tools require training, understanding and interpretation. In 19th century California, stalls that sold balm were usually close to those that sold snake oil. The implications of misinterpretation are considerable. That psychology has not been out, front and centre during this firestorm shows a lack of understanding by the media; but worse it shows a reluctance and reticence verging on the spectral from my profession, me included. Without a serious collegiate prod and then a conversation with my sister, I may still be wringing my hands like a remorseful ghost.
But I am here now. Anybody else?
Craig Knight PhD, CPsychol HRF (Exon)