Decentralized AI Concepts Gaining Traction

The notion of AI systems in which the learning is distributed to reside with local data sources, e.g., on hardware devices or at aggregation points near the edge of the network, is increasingly discussed, even in the popular press. For example see: 

Decentralised AI has the potential to upend the online economy: Analysing big data on the edge will outclass offerings by more cumbersome centralized systems

 WIRED, UK, Dec 2018; I. Ng, H. Haddadi

The key drivers behind the emergence of these ideas are the scaling, privacy, and cost challenges associated with increasingly distributed data and sensors, as reflected in evolving architectural frameworks for the Internet of Things (IoT).   

 The referenced article groups the decentralized learning approaches into three categories:

  • local learning

  • distributed or federated learning

  • cooperative learning

We refer to our approach at Prism as "collaborative analytics", and while collaborative learning embodies aspects of the previous approaches, it is unique and different. In our system, models are learned locally, using local data, and that data is never shared beyond the local source. Instead, the collection of local models self-organizes into a global network model that is optimized given the quality and dependency of information across the local sources. The only information that is communicated are compact messages (signaling statistics) that are privacy-preserving in the sense that the data cannot be backed out, even if the messages are observed. While some of the data that is being modeled might reside in cloud storage, there is no such requirement, only that the local sources being modeled can be networked together in order to organize and collaborate. 

In such a scheme, cost savings accrue because the distributed data never need to be assembled and semantically unified, which requires ongoing investment to sustain. Instead, local models can be adapted to newly arriving data at the source, and the integration into an overall "ensembled" system occurs at the level of the models, not the data. Scalability in terms of sources is enabled by the loose coupling of the distributed system, which means sources may come and go, and the system will automatically adapt to assimilate them into the global model, incrementally and on-the-fly (high agility). Compactness in the model representation ensures that very large numbers of sources can be accommodated (in the thousands). Privacy is provided "by design", both in the messaging scheme, and by eliminating situations in which the privacy violation is actually created when the data are assembled together.

Centralizing data for analysis that is natively derived from distributed collection points becomes "cumbersome", as the title of the article suggests, because the analytics are fundamentally mismatched to the way the data is generated in the first place. An illustration of this would be a case where certain data reside in slowly evolving databases, whereas other data are coming from real-time streaming sensing, and other from the scraping of social media posts. Not only is the "phenomenology" of the data types disparate, but their natural time-course-evolution is unfolding at different rates. In methods for which the learning is a truly distributed process associated with the distributed data, the data “turbulence" that is intrinsic to mashing such data together into a common data lake for analysis is avoided because the analytics themselves are distributed to the sources, and then assembled into a global model hierarchically.

 As the article points out, the economics in online information markets will be driven by new mechanisms that enable transactional business models that are efficient and protect the privacy of the sources.  See our tab at www.prisminformatix.com -> Use Cases -> Information Markets for how our Collaborative Analytics provide one approach to address that need.

On Federated Learning

On April 6, 2017, Google Research Labs published in its “latest research news” a blog entitled “Federated Learning: Collaborative Machine Learning without Centralizing Training Data”.  It can be found at:

https://research.googleblog.com/2017/04/federated-learning-collaborative.html?m=1

posted by Brendan McMahan and Daniel Ramage, Research Scientists.

We were excited to see their blog.  There are two main reasons:

  1. Their vision for distributed analytics and the attendant benefits of avoiding data integration that the blog promotes, such as preserving privacy and operating across distributed data asynchronously on the Internet, are very similar to the vision we have been espousing
  2. The problem they address is complementary to the problem we are solving, in a way that could be useful

Their post describes an application setting of mobile phones collaboratively learning a shared prediction model, while keeping all the training data on the devices, and sharing only compact privacy-preserving messages.  A focus of their message is on contrasting the challenges of learning models in a distributed environment with asynchronous, intermittently available, low bandwidth communications to what is typically found in tightly-controlled, high bandwidth, and synchronized cloud computing environments. Such settings are likely to be increasingly prevalent with the Internet of Things (IoT).

While similar in concept to our approach, a clear distinction between Google’s technology and Prism’s is that Google is addressing data that is partitioned by instance or example, which is also referred to as “horizontally distributed data”; while our technology deals with fusing data that is partitioned by feature, or so-called “vertically distributed data”. In fact, practical parallelization schemes for stochastic gradient descent (SGD) and the limited memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithms used in Google's TensorFlow, as well as the state of the art (stochastic) distributed coordinate ascent (DCA) algorithms used to solve regularized regression (with convex cost) problems, are all based on a horizontal partitioning of the data. Naturally, the business application that they are addressing fits this horizontally distributed scheme. That is, each device holds the data (all the features) for a given individual (example) and the goal is to learn the model over all individuals.

As stated in their blog, their federated learning cannot solve all machine learning problems. Similarly, our technology cannot address applications with horizontally distributed data, but it instead can address valuable business applications with vertically distributed data, as described on this website and in our downloadable whitepaper.

A final comment made in their blog to the research community: “Applying Federated Learning requires machine learning practitioners to adopt new tools and a new way of thinking: model development, training, and evaluation with no direct access to or labeling of raw data, with communication cost as a limiting factor.”

We would add, to the business community, that new approaches and a new way of thinking are needed for predictive analysis of distributed big data, with dollar cost, time to results, and privacy taken as limiting factors.