Introduction data warehouse
In this, an article today learn data warehouse introduction and in this program, I'm just going to give you a quick overview quick summary of what data warehousing is and why it's used so let's get started first of all I'd like to introduce you to the two people who got data warehousing going the first is a chap called bill Inmon born in 1945 and got his degree from Yale before moving on to New Mexico University to get his master's in computing and he had an idea he'd seen that all databases. for more information visit Wikipedia page: Introduction data warehouse
to be proper
databases had to be in third normal form but he had an idea maybe in some
situations using third normal form wouldn't necessarily be best so he wrote a paper about denormalizing data in some situations and this was taken up by dr.
ralph Kimball born in 1944 with a doctorate from Stanford and he commercialized
the idea of data warehouses so let's start at the beginning data warehousing is
not in third normal form hence its nickname Big Data because anything that's
not normalized will also be much bigger than a fully normalized database the
database.
should be denormalized in a particular way
into a second normal form, we're going to look at how that works in general in the
few slides further on this means that you get data redundancy and that means
that you need more storage and that means that you can get at the data more
quickly and more quickly is what data warehousing is all about the purpose of a data warehouse is to provide aggregate data totals of reduce trends and so on
which is in a suitable format for decision-making the decision-maker isn't interested
in whether this particular customer has paid their bill or whether that particular wine was bought on that particular the day they're interested in the general trends so they can see where the efforts
of the company.
can be best used so how'd you go about
creating a data warehouse well let's go through this diagram and start on the
left you have the operational systems within an organization marketing sales
and so on and each of these will have their own properly normalized database
you don't want to miss with the operational systems that are already
functioning so what you do is you take copies of their information and put it
into a staging area and what you do with the data in this staging area
well that's
an entirely separate piece of work so you're not messing with the day-to-day
running of the company in the staging area you can look at creating the data in
a particular way I'll show you this in just a moment and you can make sure that
all the data is in the same format and you think well it's coming from
databases surely it's going to be in the same format well not necessarily in
some departments the names of customers might be all in one field in others you
have four names surname and title you might have data in pounds in one table
and in dollars in another so we need to work out how we're going to get all of
this into one logical.
the framework goes through the integration layer
and flows into the data warehouse in a format that is standard across all the
data in that data warehouse so, for example, we would use all pounds or all
dollars or have every name as for name surname and title the data warehouse
will then be huge we've taken the data from across the organization and put it
all into one large database in second normal form but we're going to have to
answer questions for specific individuals the director of marketing will have a
set of questions Director of Finance will have a set of questions the managing the director will have a set of questions so for them we create their own data mart
the data Mart being much smaller than the data warehouse will give answers a
little more quickly again it's not having to trawl through all the data but
also, it means that the data Mart's are separate.
entities what one user does with their data
might is not going to affect what another user does so we have these data
Mart's and strategic Mart's which are just the data Mart's but in a global
overview for the company the sort of things that the managing director and
their people would be interested in ETL and data Mart's ETL stands for
extraction.
transformation and loading and these are the
three stages you have to go through to create a data warehouse first of all you
have to extract the data from the in-house databases so that's getting the data
from the original database into the staging area when it's in the staging area
we've got to transform it to make it useful so we have to get all the data into
the same format pounds or names surnames and titles and then we have to load it
into the data warehouse itself once we've got it into the data warehouse we can
then worry about the data Mart's these are the subsets of the data warehouse so
we have these subsets so that people don't mess with each other's data also it
keeps it simpler for the user.
in this case, the marketing director is not
going to be interested in what the finance director is interested in so let's
keep their queries separate it makes it simpler there are fewer options for
each person to look at and therefore less chance of them getting lost the other
big advantage is that if one of them comes up with another question we're
trying to solve that problem on a smaller set of data so it's going to be
quicker and easier to solve where you've got the general idea of a data
warehouse now let's move on to some of the specifics how is the data held and
is held as a star just like you in the middle you have a fact and these this
fact is linked to a number of dimensions here I've got four but you could have
three or twenty-three so for example.
you might have sales is the item in the
middle so for each sale you have the member ID wine ID area ID and time ID time
is needed so you get the idea of trends and you have that for every sale now
that may not be applicable for example one member may have bought many things but
this is in a second normal form so even though one member has bought say three
items that members details are now going to be recorded against each sale for
each of those sales and then finally we could look at it as if it were as a
constellation we have the sales and then we can link the different dimensions
together with other facts so, for example, the wine the member or the area and
then then we've got a proper data warehouse we can make the constellation as
complicated as we want you.
What is Data Warehousing?
A Data
Warehousing (DW) is processed for collecting and managing data from
varied sources to provide meaningful business insights. A Data warehouse is
typically used to connect and analyze business data from heterogeneous sources.
The data warehouse is the core of the BI system which is built for data
analysis and reporting.
No comments:
Post a Comment