Listen Now: Do You Really Need a Data Warehouse?

 

Episode summary: Learn the long-term benefits of investing in a data warehouse to support your business intelligence reporting initiatives.

 

“The business needs of today are not the business needs of tomorrow.” -Caleb Ochs, VP of Delivery Operations

 

 

Available anywhere you listen to podcasts 

spotify-podcast-badge-blk-grn-165x40 Apple_Podcasts_289x70px Google_Podcasts_276x70 

 

It’s common for our consultants to field questions around the necessity of a data warehouse to support reporting initiatives. After all, if it’s possible to quickly connect Power BI directly to data sources, why would a client want to invest the time and expense into a separate data warehouse? While a direct connection can help companies meet short-term, urgent reporting needs, it is not an ideal solution for those looking to scale their data initiatives. In the long run, a data warehouse is the best way to support your company’s long-term reporting needs.

 


Full transcript:

 

Brick Thompson: OK. Hey there, Caleb.

Caleb Ochs: Hey, Brick.

Brick Thompson: Yeah, welcome to our first attempt at recording a podcast here at Blue Margin. I'm Brick Thompson, Founder and CEO of Blue Margin and with me… Caleb introduce yourself.

Caleb Ochs: I’m Caleb Ochs. I am the Vice President of Delivery over here.

Brick Thompson: Alright, very good. So, while we're drinking our afternoon beer here today, I thought we'd get into a discussion about something that comes up with our clients from time to time.

You know we do a lot of building data warehouses, do a lot of building reports on Power BI, and sometimes clients will ask us, “Why do I have to go through the trouble of building a data warehouse? Why not just connect directly to the data source with Power BI and just build a report there?” And I thought we could talk about some of the pros and cons with that.

Caleb Ochs: Yeah, yeah. Sure. Maybe real quick before we get into that, we can talk about why we're drinking beer.

Brick Thompson: Oh yeah, let's do that.

Caleb Ochs: Alright, so we're located in Fort Collins, Colorado. We have… Is it still the most breweries per capita? At one point it was.

Brick Thompson: I don't know. There's a lot of them. We're not that big. I think there's like 20 some or something like that.

Caleb Ochs: So, there's a lot, yeah, so beer is a big deal around here. So, I figured we'd crack a beer and give this a shot.

Brick Thompson: All right, all right. Well with that, I thought it might be helpful if we just start by defining a little bit what we mean by a data warehouse, and what we mean by direct connecting to a data source. As a data architect yourself, what's the simplest way you could describe a data warehouse?

Caleb Ochs: So, it's like a separate back-end storage area for your data. It’s going to take some of the workload off of your front end for displaying your reports and getting something out to a user. So, it's a data warehouse could be a proper data warehouse, or it could also just be a back end that's going to do some processing for you before it hits your reports.

Brick Thompson: OK, so basically, it's a place, a repository to store data that's coming out of transactional systems like accounting systems, ERP systems, that type of thing?

Caleb Ochs: Sure, yeah.

Brick Thompson: OK.

Caleb Ochs: Yep, yep.

Brick Thompson: Very good. And then when we talk about when customers ask us about, “Hey, can we just skip the data warehouse and just connect directly with Power BI (which is our reporting program)? Say, connect directly to our financial system and do the reports there. What are they talking about there?

Caleb Ochs: So really what they're wanting to do is take the visualization tool (in this case, since we work with Power BI and Microsoft, it's using Power BI or Power Query) It's kind of the tool inside of power BI Edit that allows you to connect to different data and massage it and do some sort of extent of modeling the data. So that's what they mean. They mean taking Power Query in this case and connecting directly to Salesforce for example.

Brick Thompson: Yeah, OK, and so as I talk to clients who ask about this, it's a reasonable question. Because you know most clients don't hire us because they want to have the plumbing of a data warehouse. It's sort of the plumbing you need in order to get to get to good reporting, but if there's a way to get to the reporting quicker, they'd like to. In fact, in some cases, you can direct connect and get to the reporting quicker, but there's definitely some things you'd give up with that.

I've got a list here in front of me of a bunch of them. I'm sure you do, too. I'll just start with one. I mean, I think one of the big ones is that when you skip going to a data warehouse, you potentially take on technical debt. Meaning that, you might get your report out quicker, but if you later decide to extend that report, do more work on it, have other reports that use the same data and the same measures or calculations and KPIs. Now you've got to manage all of those things across multiple power BI files. as opposed to having those in a central location like a data warehouse, and then being able to reuse that across multiple reports.

Caleb Ochs: Right. Yeah, that's a big one. And you know, there's obviously all kinds of different cases where you're going to need to see data quickly, so you may not want to take on the overhead of building some sort of back end, and you may just need maybe need to answer a question quickly and that's fine. But you're right. I mean, what happens and what we see all the time is somebody will see a report, and then the ideas start flowing and now all of a sudden, you're trying to do other more complicated things and you're kind of having to backpedal a little bit, and you get yourself back into something that's more robust and scalable.

Brick Thompson: Yeah, and so all that work you've done on the original report is potentially wasted and you sort of have to redo it.

Caleb Ochs: Right.

Brick Thompson: And that might make sense if you need something tomorrow. OK, you do a direct connection, but it can definitely cause rework in the end, I think.

Caleb Ochs: Yeah, and it's interesting too, because it can give building something like a data warehouse a bad name because it takes longer, right? So, let's say you build a report in half a day with a direct connection, and then, all of a sudden, you know you need to go build something more complex. And you're trying to make the case of “Well, even though that thing only took me half a day, now I need this is going to take me a week and it's hard to really to convey why like why the difference in timeline there without getting too technical and you know, kind of shooting over people’s heads.

Brick Thompson: Yeah, I think one of the things as I'm talking it through with clients and trying to decide which way to go, (which is almost always data warehouse. Let's face it, unless you need some kind of quick tactical thing, one of the big things for me comes around governance and making sure that. When you have reports based on the same set of data, the same transactional system, again say like an ERP system. That when you have two different reports that are reporting the same types of numbers, like let's say just as a simple example, let's say you want net profit on one report and net profit on another report, you want to make sure those numbers are the same, and when you are building reports where you do a direct connection. Separately through two different. Power BI files. You can easily end up with people that are defining those things, even net profits, slightly differently.

So, you know, maybe you didn't subtract a certain line from the chart of accounts a certain account. Whereas, when you have a data warehouse you can define it once, so you get good governance, and you get good consistency. And I think one of the big benefits of that is really that it helps with your adoption of reporting. So you know, I'm sure you've seen it with clients where if you have two reports with supposedly the same number on them. You know the same data point, let's say net profit, but the numbers are different. Now, all of a sudden, the user is questioning which one is right. Should I even trust these reports at all? And it can really torpedo your efforts to become data driven and get people to adopt it.

Caleb Ochs: Right, even worse, it can cause, you know, conflict, unneeded conflict, between your executives. For example, right somebody got it defined one way and someone else has it the other. And without getting into alignment on what that is, then yeah, one person driving there. They're part of the business off of 1 number, the other onto the other, and then you can get into, you know it's conflict and you can just avoid that really easily.

Brick Thompson: Yeah, that's true and I'm realizing actually it might be confusing to some of our listeners here. How do you get that calculation into the data warehouse? Isn't the data warehouse just a bunch of data? Maybe you could explain that.

Caleb Ochs: Yeah, that's a good question. It is a bunch of data, but it's also a lot more than that. You know it's a place where you can apply business Logic one time and feed it to your reporting. So, you know it doesn't completely eliminate the risk of the example that you provided. You still have some additional options inside of Power BI. For example, where you can alter measures and do different types of things with your data, but it does minimize it a lot.

It's also a really important thing as you're deploying BI, to make sure that you align on those definitions. That's probably the one thing that's going to guard against that scenario, you explained the most is saying alright, what is net profit? For example, and we're going to define that. And if any time we've seen that profit on a report, it needs to be defined this way.

Brick Thompson: Yeah, and you know when I think of a data warehouse I I think more than just the data and the data structure. I think about all right, how are we going to present that to the report writer? And often we use what's referred to as an OLAP cube or just “a cube” where you have all the data. Nicely modeled so that a report writer doesn't need to be a data architect, they can understand easily, sort of where all the different fields are coming from and what the tables are, but they can also store those measures and calculations in that cube. So, you know they can define net profit. A very specific can be complicated calculation, but it's in the cube and then every other report writer that comes in can use that.

Caleb Ochs: Yeah, right.

Brick Thompson: All right, so when we're thinking about scaling, I think it's important here too, because you know, I've had clients say, well, look, I can just make a copy of my power BI file that I've created this direct connection to and create 8 reports from that with the same direct connection. And that's definitely true, but I think one of the problems you can run into is that when you realize maybe that you've modeled the data sub-optimally and that first file that you were copying from you now have 8 copies that are suboptimal, and when you make the correction to the one, now you've got to go back and make the correction to the other number of files as well.

Brick Thompson: Same with If you redefine a measure or API, you realize. Oh, I did it wrong in the first one. Right now, you're having to do it in eight of them, and so if you're really trying to scale across your organization, it just becomes untenable. Just the management of the data and getting people confident that the data is right. It becomes really difficult.

Caleb Ochs: Right, it's Excel jungle 2.0. Right?

Brick Thompson: Exactly, yeah. Yeah, there's another benefit I think, which is really dealing with data sources you don't even know you're going to need when you start, so you might think, oh, I just need to report this. Just going to come out of my ERP system and report on say, productivity in the factory, but you might realize three months, six months later, that really you need to bounce your timecard data up against that to really understand something like maybe employee utilization or something like that. And if you've got a bunch of reports that are that are direct connected a bunch of different Power BI files direct connected to the ERP, and now you've got to do that connection to your time keeping system, you have to do that over and over and over again as opposed to doing it One time, right, right.

Caleb Ochs: Right, right. And as we know the business needs of today are not the business needs of tomorrow. So things are going to change, and you don't want to tie a hand behind your back by making you know a decision with let's just connect this to Power BI. I mean I can think of examples that we've even had internally, you know, especially as we've changed our pricing. You know, not helpful if we didn't have a data warehouse or you know our data inside of a back end that we could apply some heavy logic on before it hits our reports. Yeah, we would have been stuck back at square one trying to make things happen, but with it with all of our raw data at the most granular level in our data warehouse, we were able to make some quick adjustments. We still have some more work, but we make quick adjustments to maintain some visibility there.

Brick Thompson: Yeah, it's a good example, and what Caleb is talking about and changing our pricing we recently. Over the last six months or so, we went from a time of materials pricing model to a fixed price pricing model, and it required us to redo or wreath. Think about how. So, we represent our business for ourselves on our dashboards, and if we hadn't had a data warehouse, it would have been really difficult to do that.

Caleb Ochs: Yep, right.

Brick Thompson: Let's see, you know there was another thing I was thinking about as I thought about this doing this podcast, which was this idea of Type 2 dimensions. Which I know is kind of technical, but I thought maybe we could talk about that a little bit because it's something I don't think you can do if you do a direct connection. And it's something that's really useful in reporting, so maybe we could kind of demystify that a little bit.

Caleb Ochs: Sure, yeah. Yeah, I can just define a type 2 dimension Quick. So, a type 2 dimension is something that allows you to track changes over time, so you can think of, let's say a customer record. Let's go with an employee record. So, you've got an employee record. And let's say it's a female and she gets married and decides to change her last name. So, what you might see happen is that her last name changes, so you create a new record in the employee. Table and you mark that as the most current one and then you would have your old record with that person's maiden name still attached to it so it would show you the change and it would give you timestamps of when the change happened.

Now that that example might not be too useful in reporting, but you know it's something like a customer, let's say customer headquarters changes and it becomes a new into a different region in your data, So they go from east to west. And then you're going to want to see that because now the sales Rep for the East region is not getting that revenue anymore, and the sales rep from the West region is so you want to be able to track. Historically, the East region sales rep's activity for that customer versus the West so it can be very important. The last thing I'll say about Type 2 is that it's there that it needs to be a specific use case like the ones I just gave. Otherwise, it's just more work than it's worth.

However, back to Brick’s point. In power BI and in power query, it's a lot more difficult to make that happen, especially with a direct connection. Now, there's a new thing that Microsoft rolled out called Data Marts, which might make this a little bit easier to do. It's a brand-new feature in preview currently, so things are always changing, but if you cut out data marts for example, and you just had a direct connection. You know, good luck.

Brick Thompson: There's no way to do it. You can almost think of type 2 data as sort of snapshots of data. I know it's not that exactly, but it allows you to go back in time and report against how things were at that point back in time and sometimes direct connect to a data source because of the way the data is organized, you just can't get to that unless you've been storing the data in a data warehouse.

I'll be interested to see about that Data Mart preview. I haven't read much about that yet. You know, the only other point I had was that I was as I was thinking about this and it seems like I'm just making an argument for always do a data warehouse, but it really is not the case. I mean, sometimes it does make sense to do a quick hit report, but more often really if you're if you're bringing reporting into a business or, or maturing, reporting in a business, you really need the data warehouse. And the last point I was thinking about is that sometimes data modeling can be pretty complicated, so you're bringing in data from three or four different sources and trying to make those different sources have correlations and shared, I was going to say conformed dimensions, but, shared data so that you can relate the data between the various sources, so maybe you have employees in two different data sources, and you need to be able to connect those. And that can be complicated to do, but if you do it in a data warehouse, you just have to do it the one time in the data warehouse as a...

Caleb Ochs: Yeah, it gives you so many more options, right? You essentially tie a hand behind your back with just Power BI. It's just way harder than it has to be.

Brick Thompson: Yeah, I think that's true. Alright, so give me an argument for not doing a data warehouse, a good one.

Caleb Ochs: For Power BI only?

Brick Thompson: Yeah.

Caleb Ochs: I think, if you have one source of data and you're doing something very simple. That can happen. But as we said, kinda in the beginning, you'll do that it. And sometimes it's good. Sometimes you should do that to get buy-in, or you need to do a proof of concept. Something or you just need to show that you know this tool is worthwhile. Granted there are there are drawbacks to doing that, and I was kind of alluding to it at the beginning. Where you do a proof of concept in half a day, and now all of a sudden you need a lot more budget and a lot more time to do what's really needed for the business.

It's a lot harder sell at that point, but you know, if you have something very simple and you need it and you need it quickly... Yeah, connect directly with Power BI. Make it happen. There's also another interesting use cases. I'm thinking of one in particular at the client we just worked with where rather than building an ETL and you know some custom code to go pull data from and we're pulling data from Smartsheet, we use the Power BI connector to do it, and then we stored that data inside the Power BI service and we were able to utilize it where we needed to, so it was almost like a simplified version of an ETL. And we just had the number directly displayed in the report as one number. We only need one thing from Smartsheet, and it made the job very easy. And granted, it wasn't a full-blown report, but I guess my point is that there's use cases for it. Sometimes it's a full-blown report, sometimes it's not. But you do have to really weigh what you're trying to do and what outcome you're after.

Brick Thompson: Yeah, it sounds like it. It really comes down to ROI. Right? If there's a quick thing you can do to get something you need, you know tomorrow that could be worth it. You know it can be inexpensive and fast, but if you're going to try to build a reporting suite set of reports that you're running your business on, there's going to be a lot of them. They're going to be growing. If you think about the ROI, even though that first report might be really quick and inexpensive to do because it's quick in the long run, you're actually going to have a much better return on investment by having the - I almost think of it like plumbing doing the plumbing work so that you're not having to go out to the well for a for a bucket of water. It's just in the house already. So yeah, that was a pretty good analogy, wasn't it?

Caleb Ochs: That was. (Laughing)

Brick Thompson: Alright, well gosh, we're about 20 minutes in now. Do you have any other thoughts here before we wrap up?

Caleb Ochs: You know, I think, I think just going back to your ROI piece. You know there are definite use cases where you're OK with some throwaway work, and that's OK, right? So, if you need something quick, you need it done and then after that you're not going to run your business off of it, sure. Connect right with Power BI. Get those numbers out there. Having the data is better than not having it, so don't let it keep you from doing something just because you think that you need a data warehouse for everything now. It's better to have data warehouse, but...

Brick Thompson: Says the Data Architect. (laughing) Well this has been good. It's been fun sitting here doing our first podcast, having a beer, talking about data warehouses and direct connection. Any final thoughts?

Caleb Ochs: You know, I don't think so... the time flew.

Brick Thompson: Alright, thanks, Caleb, see you soon.

Caleb Ochs: Thank you, Brick.

 

Check out other episodes on The Dashboard Effect podcast