Does Your Organisation Need A Data Whistleblower?

Does your organisation need data ‘whistleblowers’? You may think that there’s enough noise about data as it is.  Sometimes, data quality in an organization is bad, but nobody takes any ownership. Improvements can permeate through the whole organisation by proposing that organisations sometimes need a data ‘whistleblower’ to expose the issues.

What is a whistleblower? Well, a data whistleblower is basically an individual who raises issues, sometimes subversively.

A data ‘whistlerblower’ is a ‘go to’ person, to whom employees can raise issues around data quality.  It’s basically a role, which denotes someone whom anyone can go to, in order to express a confidential concern about the quality of a piece data. There could be more than one whistleblower, for example; you could have one for each department.

Alternatively, team members could take turns to be the data whistleblower. This might help to promote adoption of data as a corporate-wide asset, in which it is everyone’s interest to protect and maintain assiduously.

There are different ways in which this role could function. As a technical function, it could be as simple as setting up a wiki or a post-it noticeboard where people can go and record data quality issues anonymously or publicly. Alternatively, a more formal approach could be taken, whereby the data ‘whistleblower’ will take the collated data issues to a monthly business intelligence meeting. Results and feedback regarding data quality issues could be given via a SharePoint portal, or a monthly email newsletter.

 

Why might you need a data whistleblower? In my experience, I’ve seen cases where team members don’t feel that they can raise data quality issues with source data owners, because it is simply too political and contentious, and they don’t want to be in the firing line.  This could be an indicator of the ‘Anger’ stages in Data Quality: please read Jim Harris’ blog for more details on the Data Quality stages.  
I’ve also seen cases of the Bystander Effect in data quality. The ‘Bystander Effect’ is where people don’t intervene to help when they see a problem, perhaps because they think that the problem is so well-known, or highly-visible, that someone somewhere must be doing something about it. In other words, they might see a problem and do nothing about it, because of the dissemination of responsibility throughout the organisation.  
What would a ‘whistleblower’ role mean to a business? It would allow users to become more involved in the data quality issues in an organisation, thereby allowing the ‘business’ in business intelligence to have a greater say in shaping their own data. It also means that data quality, which should be a corporate habit rather than a one-off project, can be made a part of the business culture.
Data is a corporate asset that belongs to everybody, so everyone can help to look after it without risking their own comfort in the workplace; yes, data quality can be that contentious! The idea of a data ‘whistleblower’ is to try and find a way through it. If you have any other ideas, I’d love to hear about them!

Share on facebook
Facebook
Share on google
Google+
Share on twitter
Twitter
Share on linkedin
LinkedIn

Data-Driven isn’t enough. We need Human-Centred AI too.

In this three part blog series, Elizabeth and Jen will be focusing on the ethics of AI and data. So often, we hear how we have to be data-driven, but it is not enough. We need to be human-centred. In this three part series, we will look at important topics such as DeepFakes, Bias and AI, why these phenomenon happen, and what we can do about it.

At the beginning of 2019, Alexandria Ocasio-Cortez made headlines by stating that algorithms can perpetuate racism. She stated that:

“[algorithms] always have these racial inequities that get translated, because algorithms are still made by human beings, and those algorithms are still pegged to basic human assumptions. They’re just automated. And automated assumptions — if you don’t fix the bias, then you’re just automating the bias.” (1)

The ways that algorithms can discriminate against certain demographics is felt in many tangible ways for a lot of people. Algorithms can determine how likely you are to get a job interview, what consequences you face if you commit a crime, whether or not you will get a mortgage, and how likely you are to “randomly” get stopped by the police. Skewed data, prejudiced programmers, and false logic can mean that the results aren’t as indisputable as we may assume at first.

Algorithms are based on mathematical equations — and the answers you can glean from a numeric equation are objectively true. Many people used this fact as a retort against Ocasio-Cortez, accusing her of trying to find prejudices where there simply weren’t any. These people are wrong. Not only can algorithms perpetuate racial biases, but they are prone to perpetuating any and all societal biases.

A prominent example of race and gender bias can be found in facial recognition software, which is beginning to become a more and more popular tool in law enforcement. Joy Buolamwini at the Massachusetts Institute of Technology found that three of the most recent gender-recognition AIs could identify a person’s gender from a photo with 99 percent accuracy — but only if the person in the photo was a white man (2). This puts women and people of colour at risk of false identification — in fact, accuracy dropped all the way down to 35 percent for women of colour.

Inequalities reflected in our technology begins with us.

AI can only learn from the data it is given.

Similarly, but perhaps even more worryingly, a report by ProPublica (3) found that AI used to anticipate future criminal behaviour was heavily skewed against black people. It falsely flagged almost twice as many black defendants as potential re-offenders than white defendants, and was much more likely to mislabel white defendants who would go on to reoffend as being low-risk. And it was, as you might predict, almost always wrong in its predictions of violent crime; only 20 percent of the people it predicted would commit violent crimes actually did so.

This large, potentially dangerous oversight is likely to be down to a lack of diversity in the data used to train the algorithms. If the people programming the AI ensured that the data input contained more white men than any other demographics, then the AI will learn to identify those people with much higher accuracy. It makes sense — AI can only learn from the data it is given.

References

1- Kosoff, M. (2019). Alexandria Ocasio-Cortez Says Algorithms Can Be Racist. Here’s Why She’s Right. https://www.livescience.com/64621-how-algorithms-can-be-racist.html

2- New Scientist, (2018). Face-recognition software is perfect – if you’re a white man. https://www.newscientist.com/article/2161028-face-recognition-software-is-perfect-if-youre-a-white-man/

3- ProPublica(2016). Machine Bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Cloud Solutions: Consumer vs Business Solutions?

Confused about data storage? We can’t always assume that it is just going to work. Storage pricing alone can be complex, and you need to consider factors such as data storage per month and data operations such as updates, data egress and data transfers.

At Data Relish, we care about our customers’ data. We also care about our own data, and we have been looking carefully at the options available to us, and to our customers. 

Also, consumer-grade cloud storage solutions are subject to changes in pricing or storage limits, so we need to think carefully about reviewing our choices.

 

We also need to think about storage for data that does not fall under compliance rules. If this is the case, you’d probably be quite happy with one of the storage solutions available from companies outside Microsoft, Google or Amazon. 

To read more, head over to the Cloudberry site for a more detailed white paper on the topic. As we review our options on an ongoing basis, we look to use the best tools that can help us to flip between vendors where required. 

At Data Relish, we’ve been happy with the tools from Cloudberry and it was worth the small investment to save time – and data, of course!

Colourful Dandelions

How do we become inclusive leaders? Make your choices beautiful.

Everyone has a superpower. By considering our unconscious bias, it means we can take the time to work out what that superpower is.

Make your choices beautiful for others.

We have the chance to do great things with our superpowers every day. We can make our choices beautiful by being inclusive.

When we choose to be inclusive, our choices make us. We start to become better versions of ourselves.

Everyone has a superpower and the beauty of diversity and inclusion is that you get to find out what it is.

Make your choices beautiful for yourself.

Even Captain America had help sometimes. Asking for help or advice can seem really hard. However, leadership can be lonely sometimes, and seek our other superheroes whom you can ask for advice, thoughts, and ideas. In the words of Batman, “A hero can be anyone. Even a man doing something as simple as putting a coat around a little boy’s shoulders to let him know the world hadn’t ended.”

We are the average of the people that we spend our time with, so choose wisely. There is value in being an unexpected friend for other people. We are here for one another, too. 

Fit Small Business

Jen Stirrup named Top Analytics Influencer by Fit Small Business


(New York City, October 15th, 2018) Jen Stirrup has been named a Top Analytics Influencer by Fit Small Business. Stirrup provides outstanding big data and analytics advice and suggestions to her thousands of Twitter and LinkedIn connections.

Fit Small Business, a publication for small business owners, awards its Top Influencer award to those who have shown excellence in their field. The site carefully considers an influencer’s expertise, credentials, social media presence, and success before choosing the best.

The Fit Small Business Top Analytics Influencers of 2018 has 14 influencers, and the list is updated annually.

Fit Small Business connects small business owners with its experts who conduct extensive research and tests of various products and services, and interview industry specialists to provide the best recommendations and solutions. They keep their focus on a variety of topics that are of interest to any small business owner, including accounting, sales, marketing, and human resources, and the content attracts more than two million visitors to their site per month.

Jen Stirrup is the founder of Data Relish Ltd, which offers Business Intelligence and Data Science consultancy in the United Kingdom.

They do the research so small business owners don’t have to.

http://www.fitsmallbusiness.com

woman with microphone

Microsoft Ignite interview with Executive Team on Artificial Intelligence and Cloud

Jen Stirrup, Data Whisperer at Data Relish, took opportunity to interview Rohan Kumar, Corporate Vice President, Azure Data at Microsoft, and Eric Boyd, Corporate Vice President, AI at Microsoft.

Rohan and Eric talk about the announcements that excited them both, and there was also a good discussion on the role of Open Source at Microsoft, and what role it plays in Microsoft’s Data and Artificial Intelligence story.

There was a great discussion on Eric and Rohan’s thoughts on its role in making insights, Artificial Intelligence and insight-driven analysis real for organizations. Every organization on the planet has got data, and Microsoft are carving a path for the organizations that want to make use of it.

old timer car collection

Preventing Crash for Cash with Analytics

At #AnalyticsX, the SAS Analytics Experience event in Milan, I was fortunate enough to have the opportunity to learn about fraud prevention in the insurance industry. SAS has nearly 40-year history of working in analytics and insurance and I caught up with Ben Fletcher, Director at the Insurance Fraud Bureau (IFB), to understand better how the non-profit is using data, analytics and SAS technologies to combat insurance fraud at all levels – from mischievous fraud to highly organized fraud on a large scale, including organized ‘crash for cash’.

The IFB is a not-for-profit company established in 2006 to lead the insurance industry’s collective fight against fraud. The IFB act as a central hub for sharing insurance fraud data and intelligence, with the objective of detecting and disrupting organized fraud networks. While the fraud may often appear to be a simple concept – e.g. causing deliberate crashes to make claims and pocket the pay-out – these scams often involve many people in highly organized gangs, to help make the claim seem genuine or not obviously fraudulent. Not only that but they might be doing this on a large scale making them big contributors to the overall level of fraud.

Whether we like it or not, we are all impacted by insurance fraud. It raises the insurance premiums that we all have to pay. The gangs committing much of this fraud are likely to be involved in other criminal activity, such as drug dealing or human trafficking. So, aside from higher insurance premiums, this is an issue which impacts UK society in other ways and represents an illustration of how data can be used for social good.

Through the use of data, it’s possible to understand the scale of organized crime and to communicate that message to the rest of the industry.

The first step is understanding the scale of the issue, so it can be tackled; but until the industry can see and recognize the issue properly, then there isn’t a clear understanding of the need to tackle it. Further, the audience for the system is complex, with various bodies looking at the data through different lenses, such as lawmakers, insurance specialists and the police. When people are making fraudulent claims, the fraudster makes the claim appear genuine. The challenge is you want to identify these cases but you don’t want to wrongly question what turns out to be a genuine claim – otherwise referred to as a ‘false positive’.

The SAS platform was built to help promote collaboration between insurance companies, which are essentially struggling with the same issue; how to prevent fraud from occurring. Insurance is a risk-averse industry and this was an incredibly bold step for the industry to take. Companies would not want to share data directly with other insurance companies for competitive reasons but are more likely to do so if it’s through a third party industry body (in this case the IFB).

Given the criminal activity involved, how could the insurance industry recognize, understand and tackle the issue? 

Using the recently-launched SAS digital platform allows the industry to better understand and recognise when fraud is taking place by drawing on lots of data points to highlight common themes and anomalies. Until this point, there was aggregated data, but it was not enough to meet the challenges faced in the industry. To find the trends and patterns, the system needed to cater for the whole process; from granular data right through to summarized data, analytical models with advanced statistical analyses and data visualization.

There was a lot of historical data and risk analysis data involved previously, and the IFB needed a system that could cope with the cross-platform intricacies of granular data, as well as offer opportunities for further growth.

Each contributing organization had to ensure that their data was clean. To be accepted into the system, the data had to meet stringent guidelines and standards before it could be used. Data preparation was a key part of the process.

All in all, this sounds like a fearsome project and anyone who has tried to work with data will be able to envisage the potential issues. Within companies, there is often a great deal of data that is siloed in verticals, e.g. the marketing data versus the sales data. Getting a single company to think about their data in a horizontal fashion is incredibly challenging and organizations struggle with it. The challenge facing the IFB was exponentially larger because of the number of insurance companies involved. There are many complexities involved in ensuring that data is good enough and clean enough to be used within a single organization. Here, it’s multiplied because each organization’s conception of clean data will vary, and perhaps differ between departments of the same organization.

What has helped the project get to this point? Fletcher noted that a key part was the business expertise of the SAS team, who had a lot of experience in the insurance and fraud industry. He also noted that the SAS team understood the value of instilling robust practices and processes to support the business objectives. There was a strong conviction that the business users need to be put front and centre of the system in order to ensure that the correct insights and conclusions are drawn from the data. Throughout the project, there was an emphasis on obtaining good user requirements with an emphasis on the business practicalities that emerged from discussion.

The overall solution needed to be comprehensive and complete. Ultimately, the IFB wanted a business partner to work with them on a partnership that had longevity, and they wanted a partner that would be invested in their success as an organization.

Finding Simplicity in Complexity

As Daniel Goleman commented in his book Emotional Intelligence, when we go through high stress situations, we eventually stop thinking clearly. We are unable to prioritize, and the stress means that we can start seeing things as much more complicated than they really are. When people are analysing data in complex circumstances, this, in itself, can cause stress. So things need to have a level of simplicity.

For the project, the audience comprised a wide variety of interested parties such as lawmakers, insurance specialists, fraud specialists and police. As with many domains, the gap between user and developer can be quite wide. In this case, Fletcher noted that the SAS team helped to shrink the gap, essentially being the glue to hold things together from a technical and a business standpoint.

In terms of the output, the teams say they love the visual analytics and the simplicity of the solution. It’s helping them to be more productive because they are working with meaningful data and can get to the heart of what they need to know.

What’s next?

Data can be used to improve the customer experience, and one area of interest is increasing automation throughout the whole insurance process. While that step is further away, it’s clear that this sort of initiative can only help the insurance industry become more analytical and insights-driven.

Rethinking Definitions of AI

At the recent AnalyticsX with SAS Software, I had the pleasure of interviewing Oliver Schabenberger Executive Vice President, Chief Operating Officer and Chief Technology Officer at SAS.  Schabenberger had some very insightful thoughts on on Artificial Intelligence, and how customers approach this topic. I want to thank Mr Schabenberger for his valuable time and it was my privilege to meet him and spend time with him to discuss these topics.

Herbert_simon_red_complete 256
Herbert Simon in Red By Richard Rappaport [3] [4] – Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=4138586

Schabenberger commented that the original ambition of artificial intelligence was regarded as easily solvable. He pointed back to Herbert Simon’s assertion that The machine will be capable of doing any work that a man (or a woman!) could do’. Now, we smile at the optimism, but, at the time, the industry entered the first winter of AI since expectations were not met.

The AI industry started again with the work of people like Rosenblatt(1958) in perceptrons, and then adoption of neural nets after the work of Rumelhart and McClelland(1986) which, despite a lot of efforts, brought about more disappointment and wrought another winter for AI.

Nowadays, however, things have changed drastically and AI is having another boom. As Schabenberger noted, at one point, working on AI would get you ridiculed at a cocktail party. Now, you don’t get invited to the party unless you are working in AI. AI is back, and suddenly the market has a slew of AI experts.

So what changed? Why has AI come out of hiberation and back into the sunlight?

Today we have a different source of knowledge throughout the world: Data. Data-driven is an adjective that we hear everywhere, and businesses are rethinking their data, which has led to rethinking Artificial Intelligence.

Today, we are building better algorithms than ever. At the heart of today’s Artificial Intelligence revolution is data science. We create models using a combination of data science,  artificial intelligence and Analytics. The models impact many arenas, from healthcare, insurance, education, medicine, finance and so on.

Artificial intelligence is no longer about rules or backpropagation. It’s all about processing data and imbuing human-like intelligence in a system, with varying degrees. After some time, the system develops its own logic and there are plenty of examples where programs have learned to program themselves through being programmed implicitly.

Fundamentally, the artificial intelligence boom is an analytics boom.

As it stands now, Artificial Intelligence is enabled by massive data volumes, cloud technology and digital transformation, empowered by advances in computing.

AI impacts people who manage organizations. Previously, managers would ask the question tell me how you did that. Now, this question is now replaced by show me the data.

This data driven approach to artificial intelligence has caused very powerful transformation in the industry, but people are still confused by what Artificial Intelligence really means. Schabenberger cuts a neat and disciplined distinction between Narrow and General Artificial Intelligence.

Rethinking Narrow and General Artificial Intelligence

Shabenberger’s insight is that people’s expectations on Artificial Intelligence can be distinguished by the tasks it is expected to do, as well as its autonomy. Does it really think and work in a real environment, or is it dedicated to a narrow task, which it does well?

Artificial General Intelligence

artificial-intelligence-3685928_1280

The goal of AGI is to create a general thinking machine. It is a machine that has a thinking capacity. In Artificial General Intelligence, the definition includes self-sufficiency and broad human-like general intelligence, where the intelligence can be extrapolated from one situation to another and the system can learn over time. Perhaps, if it is to be human-like, it can forget over time, too, and choose areas to focus?

AI-20171012113039221

Schabenberger calls this approach Artificial General Intelligence (AGI). In this interpretation of Artificial Intelligence, the goal is to have the machines behave, think and be like humans. It is to attempt to realise human intelligence in hardware and software. It is distinct from the narrow form of Artificial Intelligence because it focuses on a breadth of activities, rather than a very focused attempt of using human-like intelligence in one domain with a high degree of expertise.

Narrow Artificial Intelligence

In contrast to this perspective, there’s another approach to artificial intelligence, which Schabenberger calls Artificial Narrow Intelligence. In contrast to AGI, these artificial intelligence systems do not think. These systems are applying and executing algorithms, not thinking. They may also learn from evidence, but the data and the modelling fundamentally comes from humans in some way.

For example, let’s take Siri, Alexa and Cortana. This is a narrow form of artificial intelligence which solves very specific tasks. These systems are purpose built systems, aimed at a specific objective. These systems are executing algorithms that are programmed implicitly.

What does the distinction mean for industry?

The precise distinction will allow businesses to formalize and understand what they mean when they talk about AI. With AI, everyone has an opinion, and they can mean different things. By focusing on outcomes, this means that business ideas can be generated that can be actionable and achievable by focusing on an understanding of AI that is Narrow, rather than the general, sweeping definition of Artificial Intelligence.

At present, data specialists and developers are the ones developing the system in silos, with no ethics discussion, or values imbued. However, these decisions cannot be made in a vacuum, with no form of ethics discussion.  We are all impacted by data, and we will all be impacted by Artificial Intelligence. We cannot assume to have the ethical superiority of the uninvolved, pointing fingers when things go wrong. If we are putting God in the machine, let’s make it one that we can and want, to live with.

And what do all these Artificial General Systems have in common? They do not exist, and we have absolutely no clue have to build them, according to Schabenberger. We should have a conversation about what if we could get there. How does that impact ethics? How does it impact jobs? Our planet?

To summarise, Schabenberger’s distinction is very clear for businesses to understand, and to direct their discussions on artificial intelligence. Bringing this clarity is essential when people use the same term to mean different things, and the neat distinction will facilitate successful discussions and outcomes for business.

References

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408. http://dx.doi.org/10.1037/h0042519

Rumelhart, D.E; McClelland, James (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge: MIT Press. ISBN 978-0-262-63110-5.