To view this page ensure that Adobe Flash Player version 11.1.0 or greater is installed.
DATA MANAGEMENT
THINK ‘data science’
to manage energy distribution
Data science is an emerging field and plays an intricate part in the
so-called ‘big data’ drive, where the challenge is to extract value
from vast amounts of data. This article aims to provide a backdrop
and case study for the application of data science thinking in the
energy distribution sector.
T echnology industry giants such
as Google, Apple, Facebook,
Amazon, and social media
companies, generate in the order of
petabytes of user data daily and these
volumes are growing rapidly. Moreover,
the Internet of Things (IoT) is also
contributing to high volumes of data
as a wide variety of devices, sensors,
systems, and services are connecting to
the internet in an effort to achieve greater
value by exchanging information more
efficiently. THE HIDDEN VALUE OF YOUR DATA
Possibly the primary reason why data
is growing is due to advances made
in physics and engineering, allowing
progressively faster information
processing and information storage
capability. Subsequently, companies
now gather and store more data than
they can effectively manage in terms of
business potential. This is where data
science aims to bridge the gap between
business opportunity and all the data.
The need to analyse extremely large
amounts of information in near real-time
(in some cases), to drive value from it,
is undoubtedly increasing with this data
explosion. Data scientists specialising in the
field of machine learning aim to build
algorithms capable of detecting patterns
40 in the data (hidden information), which
can be used to better understand the
underlying dynamics captured in the form
of digital information or to develop data
products that can be implemented in
real-time systems that mimic or enhance
human information processing tasks. It
also empowers us to address uncertainty.
WHAT DO UTILITIES NEED TO
ACHIEVE THIS?
The data needs to be stored on proper
data management platforms that can
scale well and provide high speed
processing (particularly for machine
learning applications). Platforms (from
the open source community) gaining
popularity are: Hadoop with its two stage
MapReduce paradigm, Apache Spark
with its in-memory iterative computation
advantages and Cluster Map Reduce
that is a Hadoop-like framework in a
distributed environment.
Alternative platforms are emerging,
but the choice of which platform to
use will depend on factors such as the
business objective (end applications),
data structures and machine learning
algorithms. Apache Spark for example,
was originally developed at UC Berkeley
and is built on top of the Hadoop
Distributed File System (HDFS) and fits
into the Hadoop open-source community,
which received code contributions from
over 30 companies including Yahoo
and Intel. This framework promises
much higher performance than Hadoop
MapReduce for machine learning
algorithms. The pitfalls to consider when
using open source may include
the supplementary custom code
development and technical complications
in the ecosystem, which require experts
to manage, deploy and monitor.
From an analytical point of view
sometimes more data does not make a
big difference (which is highly dependent
on the application) and does not
guarantee that more insights will be
gained. However, some machine learning
algorithms require ample training data to
help the algorithms generalise well over
the true underlying regularities in what is
being modeled.
It has been shown that simple
modelling methods coupled with more
data can outperform more complex
modelling methods. However, this
depends on the underlying system’s
dynamics, data types and data quality.
Data quality should be over-emphasised
and approximately 80-90% of all efforts
should revolve around it. Evidently, most
‘big data’ efforts are likely to end up in
a ‘garbage in, garbage out’ scenario
if data quality or data consistency is
neglected. WHAT IS THE POTENTIAL VALUE TO
ENERGY DISTRIBUTION?
Many technology giants mentioned
previously are utilising machine learning,
aiming to improve user experience on
their platforms by converting the user
data into recommendation engines.
Some of these recommendation engines
ESI AFRICA ISSUE 2 2015