Does Your Organisation Need A Data Whistleblower?

Does your organisation need data ‘whistleblowers’? You may think that there’s enough noise about data as it is.  Sometimes, data quality in an organization is bad, but nobody takes any ownership. Improvements can permeate through the whole organisation by proposing that organisations sometimes need a data ‘whistleblower’ to expose the issues.

What is a whistleblower? Well, a data whistleblower is basically an individual who raises issues, sometimes subversively.

A data ‘whistlerblower’ is a ‘go to’ person, to whom employees can raise issues around data quality.  It’s basically a role, which denotes someone whom anyone can go to, in order to express a confidential concern about the quality of a piece data. There could be more than one whistleblower, for example; you could have one for each department.

Alternatively, team members could take turns to be the data whistleblower. This might help to promote adoption of data as a corporate-wide asset, in which it is everyone’s interest to protect and maintain assiduously.

There are different ways in which this role could function. As a technical function, it could be as simple as setting up a wiki or a post-it noticeboard where people can go and record data quality issues anonymously or publicly. Alternatively, a more formal approach could be taken, whereby the data ‘whistleblower’ will take the collated data issues to a monthly business intelligence meeting. Results and feedback regarding data quality issues could be given via a SharePoint portal, or a monthly email newsletter.

 

Why might you need a data whistleblower? In my experience, I’ve seen cases where team members don’t feel that they can raise data quality issues with source data owners, because it is simply too political and contentious, and they don’t want to be in the firing line.  This could be an indicator of the ‘Anger’ stages in Data Quality: please read Jim Harris’ blog for more details on the Data Quality stages.  
I’ve also seen cases of the Bystander Effect in data quality. The ‘Bystander Effect’ is where people don’t intervene to help when they see a problem, perhaps because they think that the problem is so well-known, or highly-visible, that someone somewhere must be doing something about it. In other words, they might see a problem and do nothing about it, because of the dissemination of responsibility throughout the organisation.  
What would a ‘whistleblower’ role mean to a business? It would allow users to become more involved in the data quality issues in an organisation, thereby allowing the ‘business’ in business intelligence to have a greater say in shaping their own data. It also means that data quality, which should be a corporate habit rather than a one-off project, can be made a part of the business culture.
Data is a corporate asset that belongs to everybody, so everyone can help to look after it without risking their own comfort in the workplace; yes, data quality can be that contentious! The idea of a data ‘whistleblower’ is to try and find a way through it. If you have any other ideas, I’d love to hear about them!

Share on facebook
Facebook
Share on google
Google+
Share on twitter
Twitter
Share on linkedin
LinkedIn

Guest Post: Power BI in Library and Information Science

For the last few days, I have been attending the EBLIP10 (Evidence-Based Library and Information Practice) conference at the University of Strathclyde. As a recent Information and Library Science grad, I put myself down as a student volunteer so I could sit in on some of the sessions for free (and eat the fancy lunches – they had vegan tacos, so you could imagine the position I was in).

 

As you can maybe imagine, a large amount of the talks were library-centred; targeted at librarians who want to implement a more evidence-based practice in their workplace, or researchers looking to improve their methods. I was fortunate enough to attend some fascinating lectures that I may otherwise have never thought much about, and get to know some lovely and interesting information professionals from all over the world, from Portugal to Canada to Mexico.

 

One of the speakers, Louise Graham from the Scottish Library and Information Council, spoke on the importance of evidence of impact in public libraries. Describing the process her and her team had gone through in implementing new technology throughout Stirling Council Libraries to work on quantifying evidence of how patrons benefited from using their local public library, Graham spoke on the complexities and benefits of user-friendly data collection.

 

Credit to myrfa at Pixabay

Stirling Council Libraries now have a collection of touch-screen devices in their libraries that allow patrons to outline their feelings about and experiences of the library, largely via simple visuals. If a patron wanted to convey to the staff that their favourite thing about their library was the space to sit down and read in a quiet environment, they would tap on a picture of a person reading a book on a sofa. The platform was designed to be as accessible as possible to everyone who wanted to use it, and it resulted in hundreds of submissions within the first few weeks, and a wealth of evidence for the librarians to have to hand if they ever needed to defend their library.

Defending Libraries with Data

However, one thing I learned from attending EBLIP10 was that there are so many different ways to collect and display evidential data. As many of the speakers at the conference were researchers, PhD students and lecturers, a lot of this data was displayed via published papers in academic journals. But this medium isn’t usable for everyone – and certainly not those of us who are not in academia.

Since I’ve started working recently as a Data Translator for Data Relish, and am currently working on training programs for Power BI, it got me thinking – are there ways that we can use Power BI to effectively and accessibly display evidence of our work?

When it came to making evidence easily-accessible, particularly for our colleagues and customers who may not have been involved in the rest of the process, my first thought was Power BI Service and the Power BI mobile app.

Operating the full desktop version of Power BI may not be the most efficient way to quickly demonstrate that your work has yielded tangible results – when you’re looking for an in-depth analysis, it’s the ideal option for sure, but sometimes what you need is a clear, visually-appealing display of your data that a customer or less-experienced colleague can pull up quickly and understand at a glance.

 

This is the beauty of the Power BI Dashboard, in all its forms; it’s evidence of your results that anyone can understand – and enjoy looking at – regardless of whether or not they’re an experienced data scientist. And if you want clear evidence of how your work is going well, or helping others, you can’t make it so dense and complex that only a data scientist will be able to parse it. Your stakeholders are more likely to want to see your evidence represented in clear visuals.

 

Many of the speakers at the conference, while their presentations were clear and effective, presented the results of their studies in tables, and long lists of data. This was fine for those of us who were at the conference in person – we could see the slides up close, so it was easy enough to get an idea of what the data was representing. But the small text that came with fitting long lists on one PowerPoint slide caused a few problems.

 

Part of my job as a volunteer was to manage the virtual conferencing. Virtual attendees had paid £65 to watch the presentations via webcam, and it was my job to mute and unmute them, convey their questions to the speaker, and straighten out any issues as best I could. The biggest problem by far that I had to tackle was the virtual attendees being unable to read the small text; many of them completely missed out on any visual representation of the speakers’ findings. The webcam wouldn’t pick up the text – at a conference on evidence-based practice, they were missing out on the actual evidence! And making me so frazzled I had to go and stress-eat a bunch of tacos behind the reception desk! 

Summary

There’s no doubt in my mind that many of these findings could have been improved if the speakers had had a tool like Power BI at their disposal. Had they been able to turn their long lists of data into representative visuals, everyone would have been able to understand it at a glance, rather than squint at their screens and try to read some under-pixelated numbers.

 

Power BI could be a strong tool for research, particularly research where concrete evidence is paramount – and particularly research where those findings will be helpful and interesting to those outside your immediate professional circle. Some of the available visualizations look lovely, and can make even the drier data look more engaging.

 

Power BI is a perfect way to tailor the presentation of your findings to your audience, and there seem to be few corners of the professional world where it wouldn’t come in handy.

Cloud Solutions: Consumer vs Business Solutions?

Confused about data storage? We can’t always assume that it is just going to work. Storage pricing alone can be complex, and you need to consider factors such as data storage per month and data operations such as updates, data egress and data transfers.

At Data Relish, we care about our customers’ data. We also care about our own data, and we have been looking carefully at the options available to us, and to our customers. 

Also, consumer-grade cloud storage solutions are subject to changes in pricing or storage limits, so we need to think carefully about reviewing our choices.

 

We also need to think about storage for data that does not fall under compliance rules. If this is the case, you’d probably be quite happy with one of the storage solutions available from companies outside Microsoft, Google or Amazon. 

To read more, head over to the Cloudberry site for a more detailed white paper on the topic. As we review our options on an ongoing basis, we look to use the best tools that can help us to flip between vendors where required. 

At Data Relish, we’ve been happy with the tools from Cloudberry and it was worth the small investment to save time – and data, of course!

woman with microphone

Microsoft Ignite interview with Executive Team on Artificial Intelligence and Cloud

Jen Stirrup,
Data Whisperer at Data Relish, took opportunity to interview Rohan
Kumar, Corporate Vice President, Azure Data at Microsoft, and Eric
Boyd, Corporate Vice President, AI at Microsoft.

Rohan and Eric
talk about the announcements that excited them both, and there was also a
good discussion on the role of Open Source at Microsoft, and what role
it plays in Microsoft’s Data and Artificial Intelligence story.

There was a great discussion on Eric and Rohan’s thoughts on its role in making insights, Artificial Intelligence and insight-driven analysis real for organizations. Every organization on the planet has got data, and Microsoft are carving a path for the organizations that want to make use of it.

old timer car collection

Preventing Crash for Cash with Analytics

At #AnalyticsX, the SAS Analytics Experience event in Milan, I was fortunate enough to have the opportunity to learn about fraud prevention in the insurance industry. SAS has nearly 40-year history of working in analytics and insurance and I caught up with Ben Fletcher, Director at the Insurance Fraud Bureau (IFB), to understand better how the non-profit is using data, analytics and SAS technologies to combat insurance fraud at all levels – from mischievous fraud to highly organized fraud on a large scale, including organized ‘crash for cash’.

The IFB is a not-for-profit company established in 2006 to lead the insurance industry’s collective fight against fraud. The IFB act as a central hub for sharing insurance fraud data and intelligence, with the objective of detecting and disrupting organized fraud networks. While the fraud may often appear to be a simple concept – e.g. causing deliberate crashes to make claims and pocket the pay-out – these scams often involve many people in highly organized gangs, to help make the claim seem genuine or not obviously fraudulent. Not only that but they might be doing this on a large scale making them big contributors to the overall level of fraud.

Whether we like it or not, we are all impacted by insurance fraud. It raises the insurance premiums that we all have to pay. The gangs committing much of this fraud are likely to be involved in other criminal activity, such as drug dealing or human trafficking. So, aside from higher insurance premiums, this is an issue which impacts UK society in other ways and represents an illustration of how data can be used for social good.

Through the use of data, it’s possible to understand the scale of organized crime and to communicate that message to the rest of the industry.

The first step is understanding the scale of the issue, so it can be tackled; but until the industry can see and recognize the issue properly, then there isn’t a clear understanding of the need to tackle it. Further, the audience for the system is complex, with various bodies looking at the data through different lenses, such as lawmakers, insurance specialists and the police. When people are making fraudulent claims, the fraudster makes the claim appear genuine. The challenge is you want to identify these cases but you don’t want to wrongly question what turns out to be a genuine claim – otherwise referred to as a ‘false positive’.

The SAS platform was built to help promote collaboration between insurance companies, which are essentially struggling with the same issue; how to prevent fraud from occurring. Insurance is a risk-averse industry and this was an incredibly bold step for the industry to take. Companies would not want to share data directly with other insurance companies for competitive reasons but are more likely to do so if it’s through a third party industry body (in this case the IFB).

Given the criminal activity involved, how could the insurance industry recognize, understand and tackle the issue? 

Using the recently-launched SAS digital platform allows the industry to better understand and recognise when fraud is taking place by drawing on lots of data points to highlight common themes and anomalies. Until this point, there was aggregated data, but it was not enough to meet the challenges faced in the industry. To find the trends and patterns, the system needed to cater for the whole process; from granular data right through to summarized data, analytical models with advanced statistical analyses and data visualization.

There was a lot of historical data and risk analysis data involved previously, and the IFB needed a system that could cope with the cross-platform intricacies of granular data, as well as offer opportunities for further growth.

Each contributing organization had to ensure that their data was clean. To be accepted into the system, the data had to meet stringent guidelines and standards before it could be used. Data preparation was a key part of the process.

All in all, this sounds like a fearsome project and anyone who has tried to work with data will be able to envisage the potential issues. Within companies, there is often a great deal of data that is siloed in verticals, e.g. the marketing data versus the sales data. Getting a single company to think about their data in a horizontal fashion is incredibly challenging and organizations struggle with it. The challenge facing the IFB was exponentially larger because of the number of insurance companies involved. There are many complexities involved in ensuring that data is good enough and clean enough to be used within a single organization. Here, it’s multiplied because each organization’s conception of clean data will vary, and perhaps differ between departments of the same organization.

What has helped the project get to this point? Fletcher noted that a key part was the business expertise of the SAS team, who had a lot of experience in the insurance and fraud industry. He also noted that the SAS team understood the value of instilling robust practices and processes to support the business objectives. There was a strong conviction that the business users need to be put front and centre of the system in order to ensure that the correct insights and conclusions are drawn from the data. Throughout the project, there was an emphasis on obtaining good user requirements with an emphasis on the business practicalities that emerged from discussion.

The overall solution needed to be comprehensive and complete. Ultimately, the IFB wanted a business partner to work with them on a partnership that had longevity, and they wanted a partner that would be invested in their success as an organization.

Finding Simplicity in Complexity

As Daniel Goleman commented in his book Emotional Intelligence, when we go through high stress situations, we eventually stop thinking clearly. We are unable to prioritize, and the stress means that we can start seeing things as much more complicated than they really are. When people are analysing data in complex circumstances, this, in itself, can cause stress. So things need to have a level of simplicity.

For the project, the audience comprised a wide variety of interested parties such as lawmakers, insurance specialists, fraud specialists and police. As with many domains, the gap between user and developer can be quite wide. In this case, Fletcher noted that the SAS team helped to shrink the gap, essentially being the glue to hold things together from a technical and a business standpoint.

In terms of the output, the teams say they love the visual analytics and the simplicity of the solution. It’s helping them to be more productive because they are working with meaningful data and can get to the heart of what they need to know.

What’s next?

Data can be used to improve the customer experience, and one area of interest is increasing automation throughout the whole insurance process. While that step is further away, it’s clear that this sort of initiative can only help the insurance industry become more analytical and insights-driven.

Rethinking Definitions of AI

At the recent AnalyticsX with SAS Software, I had the pleasure of interviewing Oliver Schabenberger Executive Vice President, Chief Operating Officer and Chief Technology Officer at SAS.  Schabenberger had some very insightful thoughts on on Artificial Intelligence, and how customers approach this topic. I want to thank Mr Schabenberger for his valuable time and it was my privilege to meet him and spend time with him to discuss these topics.

Herbert_simon_red_complete 256
Herbert Simon in Red By Richard Rappaport [3] [4] – Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=4138586

Schabenberger commented that the original ambition of artificial intelligence was regarded as easily solvable. He pointed back to Herbert Simon’s assertion that The machine will be capable of doing any work that a man (or a woman!) could do’. Now, we smile at the optimism, but, at the time, the industry entered the first winter of AI since expectations were not met.

The AI industry started again with the work of people like Rosenblatt(1958) in perceptrons, and then adoption of neural nets after the work of Rumelhart and McClelland(1986) which, despite a lot of efforts, brought about more disappointment and wrought another winter for AI.

Nowadays, however, things have changed drastically and AI is having another boom. As Schabenberger noted, at one point, working on AI would get you ridiculed at a cocktail party. Now, you don’t get invited to the party unless you are working in AI. AI is back, and suddenly the market has a slew of AI experts.

So what changed? Why has AI come out of hiberation and back into the sunlight?

Today we have a different source of knowledge throughout the world: Data. Data-driven is an adjective that we hear everywhere, and businesses are rethinking their data, which has led to rethinking Artificial Intelligence.

Today, we are building better algorithms than ever. At the heart of today’s Artificial Intelligence revolution is data science. We create models using a combination of data science,  artificial intelligence and Analytics. The models impact many arenas, from healthcare, insurance, education, medicine, finance and so on.

Artificial intelligence is no longer about rules or backpropagation. It’s all about processing data and imbuing human-like intelligence in a system, with varying degrees. After some time, the system develops its own logic and there are plenty of examples where programs have learned to program themselves through being programmed implicitly.

Fundamentally, the artificial intelligence boom is an analytics boom.

As it stands now, Artificial Intelligence is enabled by massive data volumes, cloud technology and digital transformation, empowered by advances in computing.

AI impacts people who manage organizations. Previously, managers would ask the question tell me how you did that. Now, this question is now replaced by show me the data.

This data driven approach to artificial intelligence has caused very powerful transformation in the industry, but people are still confused by what Artificial Intelligence really means. Schabenberger cuts a neat and disciplined distinction between Narrow and General Artificial Intelligence.

Rethinking Narrow and General Artificial Intelligence

Shabenberger’s insight is that people’s expectations on Artificial Intelligence can be distinguished by the tasks it is expected to do, as well as its autonomy. Does it really think and work in a real environment, or is it dedicated to a narrow task, which it does well?

Artificial General Intelligence

artificial-intelligence-3685928_1280

The goal of AGI is to create a general thinking machine. It is a machine that has a thinking capacity. In Artificial General Intelligence, the definition includes self-sufficiency and broad human-like general intelligence, where the intelligence can be extrapolated from one situation to another and the system can learn over time. Perhaps, if it is to be human-like, it can forget over time, too, and choose areas to focus?

AI-20171012113039221

Schabenberger calls this approach Artificial General Intelligence (AGI). In this interpretation of Artificial Intelligence, the goal is to have the machines behave, think and be like humans. It is to attempt to realise human intelligence in hardware and software. It is distinct from the narrow form of Artificial Intelligence because it focuses on a breadth of activities, rather than a very focused attempt of using human-like intelligence in one domain with a high degree of expertise.

Narrow Artificial Intelligence

In contrast to this perspective, there’s another approach to artificial intelligence, which Schabenberger calls Artificial Narrow Intelligence. In contrast to AGI, these artificial intelligence systems do not think. These systems are applying and executing algorithms, not thinking. They may also learn from evidence, but the data and the modelling fundamentally comes from humans in some way.

For example, let’s take Siri, Alexa and Cortana. This is a narrow form of artificial intelligence which solves very specific tasks. These systems are purpose built systems, aimed at a specific objective. These systems are executing algorithms that are programmed implicitly.

What does the distinction mean for industry?

The precise distinction will allow businesses to formalize and understand what they mean when they talk about AI. With AI, everyone has an opinion, and they can mean different things. By focusing on outcomes, this means that business ideas can be generated that can be actionable and achievable by focusing on an understanding of AI that is Narrow, rather than the general, sweeping definition of Artificial Intelligence.

At present, data specialists and developers are the ones developing the system in silos, with no ethics discussion, or values imbued. However, these decisions cannot be made in a vacuum, with no form of ethics discussion.  We are all impacted by data, and we will all be impacted by Artificial Intelligence. We cannot assume to have the ethical superiority of the uninvolved, pointing fingers when things go wrong. If we are putting God in the machine, let’s make it one that we can and want, to live with.

And what do all these Artificial General Systems have in common? They do not exist, and we have absolutely no clue have to build them, according to Schabenberger. We should have a conversation about what if we could get there. How does that impact ethics? How does it impact jobs? Our planet?

To summarise, Schabenberger’s distinction is very clear for businesses to understand, and to direct their discussions on artificial intelligence. Bringing this clarity is essential when people use the same term to mean different things, and the neat distinction will facilitate successful discussions and outcomes for business.

References

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408. http://dx.doi.org/10.1037/h0042519

Rumelhart, D.E; McClelland, James (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge: MIT Press. ISBN 978-0-262-63110-5.