Data Architecture

☁️🧠 Automated Cloud-to-Edge Deployment of Industrial AI Models with Siemens Industrial Edge

📅 Date: September 14, 2023

✍️ Authors: Johann Bruckner, Johannes Kupser, Yvonne Quacken, Bruno Quintas, Helge Aufderheide

🔖 Topics: Cloud-to-Edge Deployment, Data Architecture, Edge Computing, Machine Learning, MQTT

🏢 Organizations: Siemens, AWS

Due to the sensitive nature of OT systems, a cloud-to-edge deployment can become a challenge. Specialized hardware devices are required, strict network protection is applied, and security policies are in place. Data can only be pulled by an intermediate factory IT system from where it can be deployed to the OT systems through highly controlled processes.

The following solution describes the “pull” deployment mechanism by using AWS services and Siemens Industrial AI software portfolio. The deployment process is enabled by three main components, the first of which is the Siemens AI Software Development Kit (AI SDK). After a model is created by a data scientist on Amazon SageMaker and stored in the SageMaker model registry, this SDK allows users to package a model in a format suitable for edge deployment using Siemens Industrial Edge. The second component, and the central connection between cloud and edge, is the Siemens AI Model Manager (AI MM). The third component is the Siemens AI Inference Server (AIIS), a specialized and hardened AI runtime environment running as a container on Siemens IEDs deployed on the shopfloor. The AIIS receives the packaged model from AI MM and is responsible to load, execute, and monitor ML models close to the production lines.

📊 Accelerating Innovation at JetBlue Using Databricks

📅 Date: June 22, 2023

✍️ Authors: Sai Ravuru, Yared Gudeta

🔖 Topics: Data Architecture

🏭 Vertical: Aerospace

🏢 Organizations: JetBlue, Databricks, Microsoft

The role of data and in particular analytics, AI and ML is key for airlines to provide a seamless experience for customers while maintaining efficient operations for optimum business goals. For a single flight, for example, from New York to London, hundreds of decisions have to be made based on factors encompassing customers, flight crews, aircraft sensors, live weather and live air traffic control (ATC) data. A large disruption such as a brutal winter storm can impact thousands of flights across the U.S. Therefore it is vital for airlines to depend on real-time data and AI & ML to make proactive real time decisions.

JetBlue has sped AI and ML deployments across a wide range of use cases spanning four lines of business, each with its own AI and ML team. The following are the fundamental functions of the business lines:

Commercial Data Science (CDS) - Revenue growth
Operations Data Science (ODS) - Cost reduction
AI & ML engineering – Go-to-market product deployment optimization
Business Intelligence – Reporting enterprise scaling and support

Each business line supports multiple strategic products that are prioritized regularly by JetBlue leadership to establish KPIs that lead to effective strategic outcomes.

Why is machine data special and what can you do with it?

📅 Date: June 22, 2023

🔖 Topics: Data Architecture

🏢 Organizations: Arch Systems

Production data can unlock opportunities for electronics manufacturing service (EMS) providers to improve operations. Evolving systems for collection and analysis of machine data is vital to those efforts. Though factories produce many different types of usable data, machine data is special because it can be collected without operational burden, creating actionable production insights in real time and automating responses to them.

As more manufacturers develop and deploy machine data collection systems, industry best practices are surfacing, and systems often adopt similar structures in response to common needs in the factory. Most architectures include these key features:

There is usually some type of streaming event broker (often called a pub/sub architecture) that receives complex files and reports from production equipment to enable advanced analytics, holistic dashboards and visualization, automated action management, and system monitoring.
Systems should be able to integrate data from both advanced machines and legacy equipment, such as PLCs.
They use specialized databases and data lakes for storage.
Dedicated telemetry and monitoring are deployed to ensure data quality.

A Data Architecture to assist Geologists in Real-Time Operations

📅 Date: June 21, 2023

✍️ Author: Nicola Lamonaca

🔖 Topics: Data Architecture

🏭 Vertical: Petroleum and Coal

🏢 Organizations: Eni, Databricks

Data plays a crucial role in making exploration and drilling operations for Eni a success all over the world. Our geologists use real-time well data collected by sensors installed on drilling pipes to keep track and to build predictive models of key properties during the drilling process.

Data is delivered by a custom dispatcher component designed to connect to a WITSML Server on all oil rigs and send time-indexed and / or depth-indexed data to any supported applications. In our case, data is delivered to Azure ADLS Gen2 in the format of WITSML files, each accompanied by a JSON file for additional custom metadata.

The visualizations generated from this data platform are used both on the oil rigs and in HQ, with operators exploring the curves enriched by the ML models as soon as they’re generated on a web application made in-house, which shows in real time how the drilling is progressing. Additionally, it is possible to explore historic data via the same application.

📊 Data pools as the foundation for the smart buildings of the future

📅 Date: June 20, 2023

✍️ Authors: Frederik De Meyer, Christian Metzger

🔖 Topics: Building information modeling, Data Architecture

🏢 Organizations: Siemens

Today’s digital building technology generates a huge amount of data. So far, however, this data has only been used to a limited extent, primarily within hierarchical automation systems. Data however is key to the new generation of modern buildings, making them climate-neutral, energy- and resource-efficient, and at some point autonomous and self-maintaining.

More straightforward is the use of digital solutions for building management by planners, developers, owners, and operators of new buildings. The creation of a building twin must be defined and implemented as a BIM goal. At the heart of it is a Common Data Environment (CDE), a central digital repository where all relevant information about a building can be stored and shared already in the project phase. CDE is a part of the BIM process and enables collaboration and information exchange between the different stakeholders of the construction project.

Beyond the design and construction phases, a CDE can also in the operation phase help make building maintenance more effective by providing easy access to essential information about the building and its technical systems. If information about equipment, sensors, their location in the building, and all other relevant components is collected in a machine-readable form from the beginning of the lifecycle and updated continuously, building management tools can access this data directly during the operations phase, thus avoiding additional effort. The exact goal is to collect data without additional effort. To achieve this, in the future engineering and commissioning tools must automatically store their results in the common twin, making reengineering obsolete.

🧠 How a Data Fabric Gets Snow Tires to a Store When You Need Them

📅 Date: April 27, 2023

✍️ Author: Susan Hall

🔖 Topics: Supply Chain Control Tower, Data Architecture

🏢 Organizations: American Tire Distributors, Promethium

“We were losing sales because the store owners were unable to answer the customers’ questions as to when exactly they would have the product in stock,” said Ehrar Jameel, director of data and analytics at ATD. The company didn’t want frustrated customers looking elsewhere. So he wanted to create what he called a “supply chain control tower” for data just like the ones at the airport.

“I wanted to give a single vision, a single pane of glass for the business, to just put in a SKU number and be able to see where that product is in the whole supply chain —not just the supply chain, but in the whole value chain of the company. ATD turned to Promethium, which provides a virtual data platform automating data management and governance across a distributed architecture with a combination of data fabric and self-service analytics capabilities.

It’s built on top of the open source SQL query engine Presto, which allows users to query data wherever it resides. It normalizes the data for query into an ANSI-compliant standard syntax, whether it comes from Oracle, Google BigQuery, Snowflake or wherever. It integrates with other business intelligence tools such as Tableau and can be used to create data pipelines. It uses natural language processing and artificial intelligence plus something it calls a “reasoner” to figure out, based on what you asked, what you’re really trying to do and the best data to answer that question.

A Deeper Look Into How SAP Datasphere Enables a Business Data Fabric

📅 Date: March 8, 2023

✍️ Author: Juergen Mueller

🔖 Topics: Partnership, Data Architecture

🏢 Organizations: SAP, Databricks, Collibra, Confluent, DataRobot

SAP announced the SAP Datasphere solution, the next generation of its data management portfolio, which gives customers easy access to business-ready data across the data landscape. SAP also introduced strategic partnerships with industry-leading data and AI companies – Collibra NV, Confluent Inc., Databricks Inc. and DataRobot Inc. – to enrich SAP Datasphere and allow organizations to create a unified data architecture that securely combines SAP software data and non-SAP data.

SAP Datasphere, and its open data ecosystem, is the technology foundation that enables a business data fabric. This is a data management architecture that simplifies the delivery of an integrated, semantically rich data layer over underlying data landscapes to provide seamless and scalable access to data without duplication. It’s not a rip-and-replace model, but is intended to connect, rather than solely move, data using data and metadata. A business data fabric equips any organization to deliver meaningful data to every data consumer — with business context and logic intact. As organizations require accurate data that is quickly available and described with business-friendly terms, this approach enables data professionals to permeate the clarity that business semantics provide throughout every use case.

Rolls-Royce Civil Aerospace keeps its Engines Running on Databricks Lakehouse

Our connected future: How industrial data sharing can unite a fragmented world

📅 Date: January 25, 2023

✍️ Author: Peter Herweck

🔖 Topics: Manufacturing Analytics, Data Architecture

🏢 Organizations: AVEVA

The rapid and effective development of the coronavirus vaccines has set a new benchmark for today’s industries–but it is not the only one. Increasingly, savvy enterprises are starting to share industrial data strategically and securely beyond their own four walls, to collaborate with partners, suppliers and even customers.

Worldwide, almost nine out of 10 (87%) business executives at larger industrial companies cite a need for the type of connected data that delivers unique insights to address challenges such as economic uncertainty, unstable geopolitical environments, historic labor shortages, and disrupted supply chains. In fact, executives report in a global study that the most common benefits of having an open and agnostic information-sharing ecosystem are greater efficiency and innovation (48%), increasing employee satisfaction (45%), and staying competitive with other companies (44%).

How Corning Built End-to-end ML on Databricks Lakehouse Platform

📅 Date: January 5, 2023

✍️ Author: Denis Kamotsky

🔖 Topics: MLOps, Quality Assurance, Data Architecture, Cloud-to-Edge Deployment

🏢 Organizations: Corning, Databricks, AWS

Specifically for quality inspection, we take high-resolution images to look for irregularities in the cells, which can be predictive of leaks and defective parts. The challenge, however, is the prevalence of false positives due to the debris in the manufacturing environment showing up in pictures.

To address this, we manually brush and blow the filters before imaging. We discovered that by notifying operators of which specific parts to clean, we could significantly reduce the total time required for the process, and machine learning came in handy. We used ML to predict whether a filter is clean or dirty based on low-resolution images taken while the operator is setting up the filter inside the imaging device. Based on the prediction, the operator would get the signal to clean the part or not, thus reducing false positives on the final high-res images, helping us move faster through the production process and providing high-quality filters.

Assembly Line

☁️🧠 Automated Cloud-to-Edge Deployment of Industrial AI Models with Siemens Industrial Edge

📊 Accelerating Innovation at JetBlue Using Databricks

Why is machine data special and what can you do with it?

A Data Architecture to assist Geologists in Real-Time Operations

📊 Data pools as the foundation for the smart buildings of the future

🧠 How a Data Fabric Gets Snow Tires to a Store When You Need Them

A Deeper Look Into How SAP Datasphere Enables a Business Data Fabric

Rolls-Royce Civil Aerospace keeps its Engines Running on Databricks Lakehouse

Our connected future: How industrial data sharing can unite a fragmented world

How Corning Built End-to-end ML on Databricks Lakehouse Platform

How to pull data into Databricks from AVEVA Data Hub