March 26, 2019

MLDAS 2019 – Abstract

How the periodic table got built up?
Astronomy enjoys the reputation of being a romantic subject. By all accounts, it is currently in its 
golden phase with even greater promise as we start this new decade.  What is less well known and less
appreciated is that modern astronomy is a beneficiary of technological gains (primarily Moore's law,
rapid developments in sensor technology) and in turn has contributed to development of sensors and their
physics (CCDs, super-conducting detectors) and was an early entry into big data (through massive surveys)
and associated tools (machine learning). The speaker will make the case using the specific science case
of "how did the Universe acquire the periodic table". As data sets in astronomy routinely cross petabytes
the use of machine learning and data analytics will dramatically increase.
Robust Learning Ideas for AI Engineering
As we embark towards deploying learned models on a variety of tasks, we are building a new science of 
‘AI Engineering’ in which questions that relate to the robustness of the solutions are central.
In this presentation we will discuss the two big classes of robustness challenges that learned models
need to pass, prior to deployment: model correctness and robustness in handling observations that were
not expected (un-modeled phenomena). We will then explain the need for extracting and building
self-competence and how that is done in practice.
The talk concludes with observations on other research questions that need to be addressed in AI
Engineering.
Neuromorphic Computing Chips for AI at the Edge
Recently, neuroscience research has revealed a great deal about the structure and operation of individual 
neurons, and medical tools have also shown a great deal about how neural activity in the different regions
of the brain follows a sensory stimulus. Moreover, the advances of software-based AI have brought us to
the edge of building low-power brain-like functioning chips and systems. Hardware implementations of
neuromorphic AI-chips have the advantage of computational speedup of machine-learning applications over
software implementation and can take full advantage of their inherent parallelism. The need for no latency,
higher security, faster computing, and less dependence on connectivity will drive the adoption of devices
that can offer AI at the edge to run machine learning algorithms more efficiently. However, edge IoT and
consumer endpoint devices need high-performance inference processing at low cost in power, price and die
size. The first part of this talk presents current fundamental trends, AI-chips design principles, and
their potential applications. The second part discusses several case studies of neuromorphic AI-chips
design developed by the speaker’s research group. The last section of the talk describes prospects on
AI-chips and their impacts on future computing.
Using Advertising Data to Model Migration, Poverty and Digital Gender Gaps
Facebook, LinkedIn and other social networks provide advertisers with “audience estimates” on how many
of their users match certain targeting criteria. These estimates are usually used for budget planning
and include targeting criteria such as (i) countries a user has lived in, (ii) the type of mobile device
they use, and (iii) their gender. In this talk I report on how we work with UN agencies and other partners
to use this type of information to monitoring international migration, map poverty, and track digital
gender gaps.
Unsupervised Video Object Segmentation for Deep Reinforcement Learning
I will present a new technique for deep reinforcement learning that automatically detects moving objects 
and uses the relevant information for action selection. The detection of moving objects is done in an
unsupervised way by exploiting structure from motion. Instead of directly learning a policy from raw
images, the agent first learns to detect and segment moving objects by exploiting flow information in
video sequences. The learned representation is then used to focus the policy of the agent on the moving
objects. Over time, the agent identifies which objects are critical for decision making and gradually
builds a policy based on relevant moving objects. This approach, which we call Motion-Oriented
REinforcement Learning (MOREL), is demonstrated on a suite of Atari games where the ability to detect
moving objects reduces the amount of interaction needed with the environment to obtain a good policy.
Furthermore, the resulting policy is more interpretable than policies that directly map images to actions
or values with a black box neural network. We can gain insight into the policy by inspecting the
segmentation and motion of each object detected by the agent. This allows practitioners to confirm whether
a policy is making decisions based on sensible information.
Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors
Developing data-driven machine learning techniques to address biomedical classification problems: applications and challenges
Recent advances in high-throughput sequencing have significantly contributed to an ever-increasing gap 
between the number of gene products (‘proteins’) whose function is well characterised and those for which
there is no functional annotation at all. Experimental techniques to determine the protein function are
often expensive and time-consuming. Recently, machine-learning (ML) techniques and further artificial
intelligence (AI) based on the combination of big data sets and statistical learning have provided
cost-effective solutions to challenging problems of sequence classification or annotations that were
previously considered difficult to address. In this talk, by combining our and other groups’ recent
research progress, I will highlight some important developments in addressing two representative
biomedical classification problems, i.e. i.e. ‘sequence labeling’ based on sequence data and ‘medical
image classification’ based on image data. In particular, I will illustrate how ML/AI models can build up
the predictive power from a variety of heterogeneous biochemical data that are derived from different
aspects/properties of the data and how these can contribute to the model performance.
Deep Learning in Julia
A tutorial introduction to Knet (pronounced "kay-net"), the Koç University deep learning framework. 
Knet is implemented in Julia, a high-level, high-performance, dynamic programming language. Julia has
two related advantages over other languages popular in machine learning like Python or Matlab:
(i) code written in pure Julia runs as fast as C, and (ii) most of the Julia standard library is
implemented in highly readable Julia code rather than e.g. C++. Knet was one of the first frameworks to
support dynamic computational graphs: models can be defined by only describing their forward computation
in plain Julia, allowing the use of loops, conditionals, recursion, closures, tuples, dictionaries,
array indexing, concatenation and other high level language features. High performance is achieved by
combining automatic differentiation of most of Julia with efficient GPU kernels and memory management.
Detecting Phishing Domains using Certificate Transparency
In April 2018, Certificate Transparency (CT) was made mandatory in Chrome for new certificates. While its 
main objective is to early detect misused/malicious certificates or rogue CAs, we observe that phishing
sites are also pressured to obtain certificates in order to look more legitimate and to reach more victims.
This provides us with an opportunity to detect and predict phishing domains early even before they are
reachable online. We conduct an extensive retrospective study of phishing and benign domain traces in CT
logs and highlight distinguishing time-, issuer-, and certificate-based characteristics. Based on our
observations, we identify useful sets of CT-based inexpensive easy-to-extract features that discriminate
between phishing and benign domains. We perform a 2-stage classification approach, where CT-based features are used to train a cost-sensitive classifier in order to minimize phishing false negatives to produce
a suspicious list. With the 2nd stage classification, false positives are refined using additional data
sources that become available with time. Our results indicate that we are able to identify new phishing
domains with a TPR of 98% and a FPR of 2.5%.
Hack me if you can: A rule mining-based advanced persistent threats detection system.
Advanced persistent threats (APT) are long-running, targeted and stealthy cyber-attacks whose primary goal 
is the theft of intellectual property or valuable sensitive business information. Security and cyber
defense experts warn that it is impossible to prevent all such attacks and suggest that it would instead
be more effective to shift the focus to trying to detect APTS in a timely fashion and minimizing their
damage. This motivates permanent monitoring of system activities (e.g. as provenance graphs) and mining of
the provenance to identify relationships between system activities and try to flag anomalous entities as
soon as they occur. Given the large volume of monitoring traces, the lack of labelled data, the
chameleon-like properties of APTs and the variety of their patterns, one has to rely on unsupervised
techniques to summarize the data and uncover interesting hidden patterns and their relationships.
In this talk, we present an unsupervised anomaly detection method that can detect realistic APT-like
attacks reliably and efficiently in multiple gigabytes of provenance traces generated by different
operating systems. We represent different types of process activity using generic, OS-independent binary
features that can easily be extracted from a stream of audit events. Our approach is based on the
extraction of anomalous entities using a rule mining algorithm, which has the advantage of producing
results as implications that can be easily interpreted by a security expert. We evaluated our anomaly
detection system on large datasets containing realistic APT-style attacks produced as part of the DARPA
Transparent Computing program, in which attacks constitute as little as 0.01% of the data, representing
several days' activity on different operating systems (Windows, BSD, Linux and Android). The obtained
results have shown that our method is able to rank attack processes as highly anomalous.
Towards verifying neural autonomous systems
Forthcoming autonomous and robotic systems, including autonomous vehicles, are expected to benefit from 
recent advances in machine learning. Yet, by their own nature, neural networks are known to be particularly
fragile and difficult to understand.

How can we give safety guarantees for a system in which some of the components, whether perception systems
or controllers, are driven by neural networks? What guarantees should these be?

In this talk I will report some initial steps taken in our Verification of Autonomous Systems research
group at Imperial College London towards answering these questions. I will introduce the concept of a
Neural Agent-Environment system, as a closed-loop system composed of an agent executing a feed forward or
a recurrent neural network interacting with an environment. I will then define a number of verification
problems, report their computational complexity and present a method for solving them. I will then move on
to discuss the topic of verification of CNN-based perception systems. In this context, I will present a
notion of transformational robustness and present a technique for establishing this property in classifiers.

The talk will report joint work with M. Akintunde, P. Kouvaros, L. Maganti, A. Kevorchian, and E. Pirovano
Analysing GPS Trajectory Data
Trajectory data collected from GPS-enabled mobile devices or vehicles are widely used in urban planning, 
traffic management, and location based services. Due to power and bandwidth limitations on such devices
devices collecting dense trajectories in real time is usually deemed too costly. However applications in
traffic management typically require dense trajectories with rich spatial and temporal real-time knowledge.
We consider several problems that come up inapplications that exploit trajectory data. These include the
problems of map creation from GPS trajectories, the problem of identifying frequently traveled paths,
and the travel time estimation.
The Best of Both Worlds: Ensuring Selective Privacy with Performance in a Collaborative Filtering Framework
Most industrial recommender systems rely on the popular collaborative filtering (CF) technique for 
providing personalized recommendations to its users. However, the very nature of CF is adversarial to
the idea of user privacy, because users need to share their preferences with others in order to be grouped
with like-minded people and receive accurate recommendations. We propose a novel selective privacy
preserving (SP2) paradigm that allows users to custom define the scope and extent of their individual
privacies, by marking their personal ratings as either public (which can be shared) or private (which
are never shared and stored only on the user device). Our SP2 framework works in two steps: (i) First, it
builds an initial recommendation model based on the sum of all public ratings that have been shared by
users and (ii) then, this public model is fine-tuned on each user’s device based on the user private ratings
, eventually learning a more accurate model. Furthermore, in this work, we introduce two different
algorithms for implementing an end-to-end SP2 framework that can scale effectively as the number of items
increases. Our user survey shows that an overwhelming fraction of users are likely to rate much more items
to improve the overall recommendations when they can control what ratings will be publicly shared with
others. In addition, our experiments on two real-world datasets demonstrate that SP2 can indeed deliver
better recommendations than other state-of-the-art methods, while preserving each individual user’s
self-defined privacy.
Efficient Machine Learning Approach to Capture Genetic Correlation
Genetic correlation is helpful in understanding the relationships among complex traits. The observation of 
substantial genetic correlation across a pair of traits, can provide insights into shared genetic pathways
as well as providing a starting point to investigate causal relationships. Attempts to estimate genetic
correlations among complex phenotypes have motivated the analysis of large datasets as well as the
development of sophisticated methods. I propose and discuss a scalable estimator of genetic correlations
for two type of omics data: SNPs and RNAseq. The method leverages the structure of data to obtain runtimes
that scale sub-linearly with the number of individuals in the input dataset.