Paramita: Hello. Welcome to PwC Luxembourg TechTalk. Today's episode will be the last in our series on data and AI. Having said that, we'll of course come back to the topic given how fast and rapidly changing it actually is. But for now, I leave you with a conversation that I had on data architecture with Vincent Gauché, a technology director here at PwC.
Paramita: Hello Vincent. So we finally meet for the recording.
Vincent: Hello. Very pleased to be with you.
Paramita: Thank you. Me too. As you know, we have been talking about data and AI in the previous episodes. And today will be the last, for now, the last episode on data and AI and then we'll come back because it's such a vast subject that you can't just you know put it aside. We'll come back to it again. We will talk about data architecture today to finish off this this series.
What is data architecture?
Vincent: A simple question! Of course, you have several definitions of it depending on who you ask and let's say what kind of focus or angle you want to take on your definition. But if I talk about enterprise data architecture for me it is a discipline where you actually define which data you want to integrate within your overall enterprise architecture, how you want to integrate them, what are the means.
And let's say the policies to integrate them and to manage them, to store them, to consume them, to distribute them. And all the recipes and technologies that you have to deploy into your organisation to leverage your data, to secure the data and of course to feed the AI.
Paramita: OK a couple of questions you linked to that. First of all, when you say "which data you want to integrate", what do you mean by that?
Vincent: An organisation overall is producing a lot of data but they also consume data. There is as data they produce themselves. There's data produced by their providers. Data that comes from internal and data that comes from external. So the question is of course from all you produce what do you keep. And there is a trend to say OK if we produce data, if data comes to us we have to store it.
And this is where data architecture starts. It is to capture the data whether it is internal data, whether it is structured data or unstructured data. You could for example consider this recording as data that's being produced by PwC. And we could store them not only in the podcast section but also maybe for later reference to extract content and maybe use what we are currently talking about in some other context.
Paramita: No pressure whatsoever.
Vincent: Yeah exactly. But you understand that then we are not only talking about numbers, we're not talking only about data that is structured in simple and easy way for computers to understand but we're talking about data as raw. So whatever is the type of data, whether it is written, whether it is a recording, whether it is just movies, all those kind of things are data that you need somehow to consider capturing as part of your organisation. So when it comes about integration, this is actually what it means. It's all the way you want to capture the... I would not say information but data first. Because first you start by creating or capturing the data and then you create information. But data architecture starts with data ingestion and data integration.
Paramita: OK. The other question that I was thinking was you spoke of enterprise architecture, what is that? And why architecture? Why do we say architecture? Is it to build something... Because it's used to build something?
Vincent: Architecture defines the way you would lay out a solution.
So a solution could be applied to many types of problems. One of the "problems" is data from data architecture. At the enterprise level, you need to -- and this is what the discipline enterprise architecture comes with as a concept -- all let's say the different domains that are related to how a business is actually organised, the processes of the business...
So, this is business architecture that basically you define there as part of enterprise architecture. Based on those processes you have two different functions that are used in the process and the applications that implement those functions. That is the application architecture that is also part of the enterprise architecture. Of course, to render the service and render the functions, applications, you need data and then you need to define the data architecture that is also part of the enterprise architecture, of this famous umbrella.
And there are two others - technology architecture which is basically the infrastructure where the system runs... whether it is on premises, which server, what kind of technology you are using to run the application.
And then there's the security architecture. How do you protect all this? So all of those domains are related to the enterprise architecture. So, data architecture is actually a part of it.
Paramita: I remember last time that we were talking about data architecture you showed me a kind of a chart that places it... because in our previous episodes we spoke of data management...
Vincent: It is true that when we talk about architecture we generally like to draw charts and to draw a map. So which is a little bit more complex to explain in words. We are the kind of people who like boxes and arrows.
Paramita: I remember that you showed that because I asked where does data architecture fit in data management...
Vincent: As part of the discipline of data management... it is everything which is related to data. It starts with governance. How do you use data ethically, who owns the data... I have a clear map of what is a reference in terms of data to actually identify the usage that is made with the data. I have a chance to have data lineage as well. Where that data comes from. When we are considering an information, we have to ask ourselves where the information comes from, based on which data this information was produced. I think it's important.
And there is also the security aspect that is part of data management but data architecture is actually a part of it. Data management and all those terms are definitions and depending if you just google it, google them all and start reading, of course you are completely lost because all the companies are mainly, when they sell products, are focussing their definitions on what their products have to offer. So there is no one definition fits let's say everyone. It's more that when we use a term, we have to have a common agreement of what the term means and as long as we have a common agreement of what this term means and we know who is responsible to manage those different disciplines that is behind the term. But data management is yes indeed... if you refer to the Wikipedia page I think you will see that data management is part of it.
Paramita: So if I ask you -- because I said that this episode will probably be the last episode on data and AI -- if I ask you what is exactly the lifecycle of data within an organisation? Just to give an overview to our listeners. What is the lifecycle of data when it gets into an organisation?
Vincent: This is a very vast question because of course there are several... The question is more about the usage. You do not generate data out of the blue. Either it comes from an interview that you may have with a client where you just ask questions and you capture the answer and this becomes data that you store in your systems. But you also have information. This is another type of data. This time, the data is structured, is been processed and you turn something which has no relevance, no real relevance if you take it apart so that the data elements are raw data into something that makes sense for the individuals that need to deal with it. And this is what information is all about.
So how do you want to in your life cycle... what do you want to target? It's more data or more information because it's not that clear when data dies basically. You capture it and after it is there to last.
Paramita: Like forever?
Vincent: Like until you consider that it is longer relevant. But today it's quite difficult to tell when a data is no longer relevant.
So of course, if it's for example data of a recipe and you make the recipe evolve could be interesting to keep the old recipe because maybe in 50 years you'd say oh by the way our ancestor was doing this kind of recipe that way.
If we are talking about financial information, it could also be interesting to keep the data point for later reference and to make a comparison for example whether there is an increase or decrease of this particular KPI for example.
But in terms of lifecycle of course if we talk about how data is transformed into information it is clear that when data first reaches the architecture nowadays the first thing we do we store it. Whether it is on persistent storage or lean memory storage but we store it, we secure it.
And then we add a couple of data around this data to make it more searchable, recognisable which we call meta data. And then, when it is time to say oh this data must be of relevance and we can start augmenting it.
So, we complete data with actually additional data elements that comes with it. For example, if you say that the data is a mail you could consider that first you store the mail. You secure it because there are reference to names for example in there. And then, when the mail becomes eligible to be processed for...
Paramita: By "mail", you mean "email", yeah?
Vincent: An email, yes indeed. "Male" could work as well. We can scan a male, a picture could work as well but there is no limit basically.
But actually a picture might be a better idea because then if you take the picture, then you store the picture, you secure the picture and then you create a meta data like by which channel did you receive this mail. When did you receive it. Then you start to add who sent the email for example, to whom was the email addressed for example. And then, you can say OK now we start to have... we augment the data. So we complete the data, we enrich the data with additional processing information like we turned that image into text. Then the text let's imagine is in French. We turn the French text into English text. And this from one picture we start to have written texts in different languages that can go in to algorithm. Because then you can imagine that you have a mechanism to analyse the sentiment of the author of the email when he was writing it. Actually there are algorithms for that today in the AI space that analyse the sentiment of users whether it is verbally or when they're writing the email depending on the words that are used, depending on let's say the sentences that are next to each other, the punctuation. And you can identify whether the writer was happy or whether there's a little bit of anger in his mood I would say and the same applies to people who are speaking to machines.
Now we are starting to be in a situation where this is no longer science fiction. This is the truth.
Paramita: I think you can see my eyes they're just...
Vincent: Yeah you're astonished.
Paramita: ... getting bigger and bigger. And good that you're talking about algorithms and everything because one of my questions was you know how data architecture and artificial intelligence how do they work together.
Vincent: I think, when you're talking about... because artificial intelligence is a vast domain... when you're talking about AI, I think you're talking about machine learning or neural network, deep learning, all those kind of things. Because if we just peel the onion you and you know you have artificial intelligence, then there are several methods of artificial intelligence.
One of them is machine learning and in machine learning, you have deep learning with neural networks typically. And those small networks or huge networks I would say need to learn. They need to learn.
So, you need to teach them how to process the algorithm. Because they know nothing in the beginning and data is there to support those kind of activities. And then, to make those neural networks learn, you have several approaches. One of them being that you ingest data, input data into the neural network and you compare let's say what the result of the neural network is. But you know what it should have been. You compare, you create an error and you let's say based on the deviation between the actual results you obtain and the expected result you were aiming at getting from the network, you basically update the algorithm into the neural network so that after many many trials, the neural network finally converges to a point where there is a certain capacity for the network to actually perform the way you would expect the network to perform, actually to propose a solution that complies with your expectations.
So it's not only... so that the accuracy of the network is 100% and we are not looking for 100% accuracy but it is often accurate enough to solve classification of prediction problems.
Paramita: How long does a process like this take?
Vincent: It depends really on the algorithm. Of course, you can imagine that when you want to process voice and when you want to process languages etc. it's always... it can be long, very long.
And a small story for example... Huge players, public cloud players, like Amazon, Google, Microsoft, IBM and others, when they were trying to have their network understand the different languages, basically they used movies for that. In the movies, you have generally people speaking of course and then you have the voice. And then, next to that, you have the closed caption. So, if you for example say as an input to my network I have let's say the sound. And in output I expect to have the translation of what the people said into text. So, it's a speech-to-text algorithm. Then you can compare what the network produced based on the sound of the movies and see what are... the supposed to be the closed captions that were recognised by the network and the actual closed captions like they were written down by people who actually synchronise the image and the sound. And by comparison, they finally converge to an algorithm that seems to work pretty good.
Paramita: For example for a transcription software?
Vincent: When you are with your personal assistant on your smartphone for example, when you say...
Paramita: Do I need an umbrella today?
Vincent: For example... Of course, the system must recognise what you're saying.
Paramita: It's fascinating.
Vincent: It is fascinating. Some would say it's fascinating, some would say it's scary. It depends.
Paramita: Yeah absolutely. There is a part of it that is scary of course because I think the scare is mainly because a lot of people are not aware. You know there's kind of a lack of trust if I may say. Because basically we don't know, we really don't know what happens you know when we just give out our data and what happens to it. Like you said you know it can be relevant even after 50 years. So and not just the data and how it is being processed, how it will be processed and...
Vincent: It is true that with neural networks you can hardly audit the way the data has been processed. The algorithm it's not like a standard algorithm where a developer or an engineer basically thought about it and it was translated into code and you know perfectly depending on where you are in the code if you stop the code you say ah it is doing that. And you know that the next step is going to be... you know where the next step is going to be.
With neural nets you just don't know what this layer is starting to understand and what the next layer will actually add or remove or help the algorithm converge on which aspect. You just don't know. So you have to be trustful indeed.
I believe one of the people you talked to on this domain, has already mentioned that if you take the autopilot for example. If you take the autopilot and you enable the autopilot in your car, you need to trust the algorithm. Of course, you need to trust the algorithm. If you don't trust the algorithm you will never push the autopilot button.
Paramita: Yeah and I also think -- it's my I my personal opinion -- I think the terminology sometimes is like when you say "neural" network it is so... you know "artificial intelligence" it is very human related.
Vincent: Exactly. But this is indeed because if you have a close look on how it is built in terms of IT, it tries to mimic the way an actual neuron works. And a neural net is exactly that. The fact that you have a neuron that does a very simple operation, it makes a multiplication, a sum and applies a certain function.
And then there are many of them that are all connected, almost all connected to each other layer after layer if I take one example of a neural network topology. And then of course you can imagine that somewhere your neural network is a kind of a small brain.
For the way we understand the brain for now. Let's say, if we make more progress on understanding how the brain actually works, I do believe that in the future the neural nets will have a different topology, different layout and would be built completely differently. But we need first to make progress on understanding how our brain works before we can actually make progress on the neural nets, the technology to have a real brain thinking in a machine.
Paramita: Our objective was with this miniseries to really talk about different, in detail, kind of the small aspects of data like how is data managed. How is it governed in a company and you know different aspects of AI that we talk about a lot these days like ethical AI and... I really hope that our listeners got a kind of an overview of how it works, how data and AI they work together and got some knowledge about the different aspects and you know that the scare will be lesser in future because of...
Vincent: You have a tendency, humans have a tendency to be scared about things they don't understand. So, the objective I think is to demystify a little bit those concepts. And I think it's always good to talk about it because it's too easy to just blame the thing that we don't know or we know nothing about.
But still indeed there are some business activities that will actually be rendered in the future --- not only business in your day to day life as well -- and services that would be just available because or thanks to AI. So, we don't have to be scared about that.
And as well in the field of medicine, we have started to apply AI to analyse scans or images of bodies to see and to detect tumours and those kinds of things as well.
Paramita: It's our ally.
Vincent: It is so as well.
Paramita: Exactly. Well thank you Vincent. I hope that we sit together one more day to talk about other things, other aspects of AI and where it is going and what's happening. But for now I'll say goodbye.
Vincent: Thank you Paramita.
Paramita: Thank you so much. It was really interesting.
Paramita: So that was my conversation with Vincent Gauché. I hope you enjoyed the show. And do tune in next week for another brand new episode of PwC Luxembourg TechTalk.
Pauline André
Director, Head of Marketing & Communications, PwC Luxembourg
Tel: +352 49 48 48 3582