The market that NoSQL addresses is quite wide and populous. It includes not only databases but also utilities to accelerate data collection, analytics, and visualization. The whole idea of Big Data is to derive useful intelligence and information from the vast amount of data that were ignored and discarded before. So in a way, it is data mining and business intelligence. But Big Data is different in the magnitude of its volume, velocity, and variety. In the enterprise market, most data in question are in known formats (structured), and their variety is limited. Also, it is rare that a vast amount of data comes in real time. But this is changing now because of SNS and the mobile computing invasion.
Fluid Operations (FluidOps) aggregates data from different sources and converts them with some intelligence for better analysis. I sat with Peter Haase, senior architect, and chatted about their Information Workbench, a comprehensive tool for collecting and analyzing data and visualizing useful information.
Fluid Operations is located in Walldorf, Germany. SAP’s headquarters is there as well. They currently have no US office, but their website provides information in both German and English. Peter and other people from the company are fluent in English.
As in other areas, in the power business, utilities companies collect and aggregate various kinds of data in addition to meter-read data. They may monitor equipment on the distribution grid, such as transformers, switches, relays, and capacitor banks. The data from the equipment and the meter-read data may be generated at dramatically different speeds. In addition to dynamic and real-time data, some static data types like asset information, including equipment location, brand, model, specification, and service records, may be required to provide preventive maintenance and report malfunctions and failures. The FluidOps solution is to collect and aggregate data from multiple sources and then to translate each datum semantically to a common form so that it has more meaningful information associated with it. Since all the translated data are in the same form with more meaningful relationships among them, analytics becomes more effective and can lead to more appropriate action.
“Semantically” means that they convert collected data into their normal form, which is represented using the Resource Description Framework (RDF). I will not get into details here. Although it is not the same but in a way, it is similar to Entity-Relation model. An example diagram is shown here.
All the data collected are converted into this format. The query language for RDF is SPARQL Protocol and RDF Query Language (SPARQL).
FluidOps Information Workbench consists of data integration and storage, data management, and presentation/interaction/UI customization layers. At the 30,000-foot view, it collects and associates data using semantic models from diverse industry segments. For example, the Linking Open Data Community project is an attempt to make data from different industry segments freely available, and for that, data are represented in RDF. The segments include media, geographic, publications, user-generated, governments, and life science. Their relationships are shown in the following diagram, which is maintained by Richard Cyganiak and Anja Jentzsch.
Click each circle on the figure here (not the figure above) to drill down through each dataset.
The following figure illustrates how Information Workbench collects and associates data with other data to increase their value semantically.
The disparate sources include tweets, Facebook, YouTube, data.gov, office documents, and various video files.
The architecture of Information Workbench is shown below. It consists of a data integration and storage layer (green), data management (brown), and presentation, interaction, and UI customization (blue).
Fluid Operations looked at the availability of RDF datasets to exploit for effective analytics. Their current application areas include media and health care and life sciences. I asked Peter about its application to the power industry. He said they were not looking into that yet but may consider it if they get a research grant. I do not know whether a dataset is already available for the power industry, but I think it might help the industry to exploit something like this.
I talked about each utility’s operation, but if we look at each region, such as ISO/RTO, the regional power balance information and data are very useful. I would like to follow this as it grows.