Why do we need DSI if we already have the Data Warehouse?
The Data Warehouse supports operational reporting. While it has served us well, over the past 15 years, the warehouse has become cluttered with multiple copies of similar data, presenting users with challenges in determining the right source. In many cases, there is a lack of data definition consistency that makes it hard to understand data across departments, schools, and campuses. Additionally, due to long development lead times, this data is not considered timely. DSI helps decision makers know where to find data, where it comes from, and how it is derived.
What is the difference between analytical and operational reporting?
Operational reporting is detailed, transaction-oriented data that is used to support the day-to-day functions of the university, keeping it running smoothly.
Analytical reporting supports the strategic direction of the university and is focused more on summary information and trending, and should function across campuses.
Why are we rebuilding data sources that already exist in the Data Warehouse?
The IUIE is a great resource that delivers the operational reporting used to run the university on a day-to-day basis. However, it's not necessarily structured or secured in a way to further our pursuit of better decision making. As we build our analytical data warehouse, we must understand where data comes from and what business logic has been applied to it, and that we have consistency across our data sources. Sourcing data back to the Operational Data Source and delivering it in a manner that is structured for decision making will be one of the cornerstones of fulfilling our mission.
What is timely data?
Timely data is data that is available when you need it. DSI aims to provide near 24x7 access to data for analysis with minimal dependencies on long-running batch processes, as well as access to real-time data when necessary.
What is relevant data?
Relevant data is the right data at the right time, in an understandable, accessible format. DSI makes it easy to get the specific data you need to make a decision, without all the stuff you don't.
What is accurate data?
Accurate data is data with few to no errors and may not be perfect. For most uses, it’s not perfect data that’s needed, but data within acceptable tolerances. Accurate data can be trusted and understood.
How do you ensure your data is accurate?
DSI validates its data against trusted university sources. For example, the data supplied by Institutional Research (IR) is the gold standard for the university. They are a key resource of institutional knowledge for supplying data to the university. However, we do not intend to replicate IR data. IR data is by design, and need is very specific and very accurate. Rather, our intent is to provide analytical data for making broader decisions. This data might not conform precisely to IR data, as they have different purposes and are meant to support different needs. This same approach is applied to all types of data.
Why are we not dimensional modeling?
Dimensional modeling is a traditional design technique for databases intended to support end user queries. This organization of data differs from that of a transactional system in that it is designed with two primary goals: faster data retrieval and better understandability.
While dimensional modeling has been a staple for data warehouse design since the early 2000s, new technologies such as faster databases, data virtualization, in-memory computing, and distributed processing (Hadoop) have lessened the need for building dimensional models to support decision making. These technologies have enabled faster retrieval and improved data integration and organization than previously possible. Depending on user needs, Tableau’s data engine with data integrated through data virtualization may be sufficient.
DSI will seek first to address decision support needs through quickly delivering data using tools such as Denodo and Tableau. When these tools are not sufficient, DSI will look to model data appropriately, with ease of use and performance as the primary drivers.
Why do we need new reporting tools?
Traditional BI and analytics architectures rely on the skills of developers and database administrators to properly model, index, aggregate, and optimize databases for performance.
Most analytics platform vendors have recognized the need to provide their own performance layer instead of relying on the database for performance improvements. This need for performance has given rise to data discovery vendors such as Tableau, Tibco, Qlik, and others to provide tools that not only allow visualizations of data but also includes a proprietary architecture that blends data integration and in-memory storage. In-memory analytics tools will enable better self-service analysis by reducing the dependency on aggregates and cubes built in advance by UITS.
DSI uses Tableau in most cases, but we recognize that it's not the best solution for all things. There will still be demand for the traditional modular architecture in cases where analytical processing requires accessing massive amounts of data stored in databases.