References of "Theobald, Martin 50026247"
     in
Bookmark and Share    
See detailConvergence Analysis of Decentralized ASGD
Dalle Lucca Tosi, Mauro UL; Theobald, Martin UL

E-print/Working paper (2023)

Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models ... [more ▼]

Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of O(σ ɛ⁻²) + O(Q S_avg ɛ⁻³ᐟ²)+ O(S_avg ɛ⁻¹)) for the convergence rate of DASGD, where S_avg is the average staleness between models, Q is a constant that bounds the norm of the gradients, and ɛ is a (small) error that is allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be O(σ ɛ⁻²) + O(√(Ŝ_avg Ŝ_max) ɛ⁻¹)), with Ŝ_max and Ŝ_avg representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We anticipate that our results will be of high relevance for the adoption of DASGD by a broad community of researchers and developers. [less ▲]

Detailed reference viewed: 66 (1 UL)
See detailOPTWIN: Drift identification with optimal sub-windows
Dalle Lucca Tosi, Mauro UL; Theobald, Martin UL

E-print/Working paper (2023)

Online Learning (OL) is a field of research that is increasingly gaining attention both in academia and industry. One of the main challenges of OL is the inherent presence of concept drifts, which are ... [more ▼]

Online Learning (OL) is a field of research that is increasingly gaining attention both in academia and industry. One of the main challenges of OL is the inherent presence of concept drifts, which are commonly defined as unforeseeable changes in the statistical properties of an incoming data stream over time. The detection of concept drifts typically involves analyzing the error rates produced by an underlying OL algorithm in order to identify if a concept drift occurred or not, such that the OL algorithm can adapt accordingly. Current concept-drift detectors perform very well, i.e., with low false negative rates, but they still tend to exhibit high false positive rates in the concept-drift detection. This may impact the performance of the learner and result in an undue amount of computational resources spent on retraining a model that actually still performs within its expected range. In this paper, we propose OPTWIN, our “OPTimal WINdow” concept drift detector. OPTWIN uses a sliding window of events over an incoming data stream to track the errors of an OL algorithm. The novelty of OPTWIN is to consider both the means and the variances of the error rates produced by a learner in order to split the sliding window into two provably optimal subwindows, such that the split occurs at the earliest event at which a statistically significant difference according to either the 𝑡- or the 𝑓 -tests occurred. We assessed OPTWIN over the MOA framework, using ADWIN, DDM, EDDM, STEPD and ECDD as baselines over 7 synthetic and real-world datasets, and in the presence of both sudden and gradual concept drifts. In our experiments, we show that OPTWIN surpasses the F1-score of the baselines in a statistically significant manner while maintaining a lower detection delay and saving up to 21% of time spent on retraining the models. [less ▲]

Detailed reference viewed: 97 (5 UL)
See detailTensAIR: Online Learning from Data Streams via Asynchronous Iterative Routing
Dalle Lucca Tosi, Mauro UL; Ellampallil Venugopal, Vinu UL; Theobald, Martin UL

E-print/Working paper (2023)

Online learning (OL) from data streams is an emerging area of research that encompasses numerous challenges from stream processing, machine learning, and networking. Recent extensions of stream-processing ... [more ▼]

Online learning (OL) from data streams is an emerging area of research that encompasses numerous challenges from stream processing, machine learning, and networking. Recent extensions of stream-processing platforms, such as Apache Kafka and Flink, already provide basic extensions for the training of neural networks in a stream-processing pipeline. However, these extensions are not scalable and flexible enough for many real-world use-cases, since they do not integrate the neural-network libraries as a first-class citizen into their architectures. In this paper, we present TensAIR, which provides an end-to-end dataflow engine for OL from data streams via a protocol to which we refer as asynchronous iterative routing. TensAIR supports the common dataflow operators, such as Map, Reduce, Join, and has been augmented by the data-parallel OL functions train and predict. These belong to the new Model operator, in which an initial TensorFlow model (either freshly initialized or pre-trained) is replicated among multiple decentralized worker nodes. Our decentralized architecture allows TensAIR to efficiently shard incoming data batches across the distributed model replicas, which in turn trigger the model updates via asynchronous stochastic gradient descent. We empirically demonstrate that TensAIR achieves a nearly linear scale-out in terms of (1) the number of worker nodes deployed in the network, and (2) the throughput at which the data batches arrive at the dataflow operators. We exemplify the versatility of TensAIR by investigating both sparse (Word2Vec) and dense (CIFAR-10) use-cases, for which we are able to demonstrate very significant performance improvements in comparison to Kafka, Flink, and Horovod. We also demonstrate the magnitude of these improvements by depicting the possibility of real-time concept drift adaptation of a sentiment analysis model trained over a Twitter stream. [less ▲]

Detailed reference viewed: 93 (11 UL)
Full Text
Peer Reviewed
See detailEfficient Hessian-based DNN Optimization via Chain-Rule Approximation
Temperoni, Alessandro UL; Dalle Lucca Tosi, Mauro UL; Theobald, Martin UL

in Proceedings of the 6th Joint International Conference on Data Science Management of Data (10th ACM IKDD CODS and 28th COMAD) (2023)

Learning non use-case specific models has been shown to be a challenging task in Deep Learning (DL). Hyperparameter tuning requires long training sessions that have to be restarted any time the network or ... [more ▼]

Learning non use-case specific models has been shown to be a challenging task in Deep Learning (DL). Hyperparameter tuning requires long training sessions that have to be restarted any time the network or the dataset changes and are not affordable by most stakeholders in industry and research. Many attempts have been made to justify and understand the source of the use-case specificity that distinguishes DL problems. To this date, second-order optimization methods have been partially shown to be effective in some cases but have not been sufficiently investigated in the context of learning and optimization. In this work, we present a chain rule for the efficient approximation of the Hessian matrix (i.e., the second-order derivatives) of the weights across the layers of a Deep Neural Network (DNN). We show the application of our approach for weight optimization during DNN training, as we believe that this is a step that particularly suffers from the enormous variety of the optimizers provided by state-of-the-art libraries such as Keras and PyTorch. We demonstrate—both theoretically and empirically—the improved accuracy of our approximation technique and that the Hessian is a useful diagnostic tool which helps to more rigorously optimize training. Our preliminary experiments prove the efficiency as well as the improved convergence of our approach which both are crucial aspects for DNN training. [less ▲]

Detailed reference viewed: 266 (14 UL)
Full Text
Peer Reviewed
See detailTargeting a light-weight and multi-channel approach for distributed stream processing
Venugopal, Vinu Ellampallil; Theobald, Martin UL; Tassetti, Damien et al

in Journal of Parallel and Distributed Computing (2022), 167

Detailed reference viewed: 102 (5 UL)
Full Text
Peer Reviewed
See detailRobust and Provable Guarantees for Sparse Random Embeddings
Skorski, Maciej; Temperoni, Alessandro UL; Theobald, Martin UL

in Advances in Knowledge Discovery and Data Mining - 26th Pacific-Asia Conference, PAKDD 2022, Chengdu, China, May 16-19, 2022, Proceedings, Part II. (2022, May 18)

Detailed reference viewed: 70 (1 UL)
Full Text
Peer Reviewed
See detailConvergence time analysis of Asynchronous Distributed Artificial Neural Networks
Dalle Lucca Tosi, Mauro UL; Ellampallil Venugopal, Vinu; Theobald, Martin UL

in 5th Joint International Conference on Data Science Management of Data (9th ACM IKDD CODS and 27th COMAD) (2022)

Artificial Neural Networks (ANNs) have drawn academy and industry attention for their ability to represent and solve complex problems. Researchers are studying how to distribute their computation to ... [more ▼]

Artificial Neural Networks (ANNs) have drawn academy and industry attention for their ability to represent and solve complex problems. Researchers are studying how to distribute their computation to reduce their training time. However, the most common approaches in this direction are synchronous, letting computational resources sub-utilized. Asynchronous training does not have this drawback but is impacted by staled gradient updates, which have not been extended researched yet. Considering this, we experimentally investigate how stale gradients affect the convergence time and loss value of an ANN. In particular, we analyze an asynchronous distributed implementation of a Word2Vec model, in which the impact of staleness is negligible and can be ignored considering the computational speedup we achieve by allowing the staleness. [less ▲]

Detailed reference viewed: 122 (11 UL)
Full Text
Peer Reviewed
See detailRevisiting Weight Initialization of Deep Neural Networks
Skorski, Maciej; Temperoni, Alessandro UL; Theobald, Martin UL

in Proceedings of Machine Learning Research (2021, November 17)

Detailed reference viewed: 75 (0 UL)
Full Text
See detailAsynchronous Stream Data Processing using a Light-Weight and High-Performance Dataflow Engine
Ellampallil Venugopal, Vinu UL; Theobald, Martin UL

Presentation (2020, December 11)

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous ... [more ▼]

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous progress in distributed stream processing systems in the past few years, the high-throughput and low-latency (a.k.a. high sustainable-throughput) requirement of modern applications is pushing the limits of traditional data processing infrastructures. This paper introduces a new distributed stream data processing engine (DSPE), called “Asynchronous Iterative Routing” or simply AIR, which implements a light-weight, dynamic sharding protocol. AIR expedites a direct and asynchronous communication among all the worker nodes via multiple Message Passing Interface (MPI) communication channels and thereby completely avoids any additional communication overhead with a dedicated master node. With its unique design, AIR scales out to clusters consisting of up to 8 nodes and 224 cores, performing much better than existing DSPEs, and it performs up to 15 times better than Spark and Flink in terms of sustainable-throughput. [less ▲]

Detailed reference viewed: 211 (6 UL)
See detail27th International Symposium on Temporal Representation and Reasoning, TIME 2020, September 23-25, 2020, Bozen-Bolzano, Italy
Munoz-Velasco, Emilio; Ozaki, Ana; Theobald, Martin UL

Book published by Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Detailed reference viewed: 108 (18 UL)
Full Text
Peer Reviewed
See detailAIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing
Ellampallil Venugopal, Vinu UL; Theobald, Martin UL; Chaychi, Samira UL et al

in AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing (2020, September 01)

Distributed Stream Processing Engines (DSPEs) are currently among the most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow ... [more ▼]

Distributed Stream Processing Engines (DSPEs) are currently among the most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow programs and big data analytics. In this paper, we describe the architecture of our AIR engine, which is designed from scratch in C++ using the Message Passing Interface (MPI), pthreads for multithreading, and is directly deployed on top of a common HPC workload manager such as SLURM. AIR implements a light-weight, dynamic sharding protocol (referred to as “Asynchronous Iterative Routing”), which facilitates a direct and asynchronous communication among all worker nodes and thereby completely avoids any additional communication overhead with a dedicated master node. With its unique design, AIR fills the gap between the prevalent scale-out (but Java-based) architectures like Apache Spark and Flink, on one hand, and recent scale-up (and C++ based) prototypes such as StreamBox and PiCo, on the other hand. Our experiments over various benchmark settings confirm that AIR performs as good as the best scale-up SPEs on a single-node setup, while it outperforms existing scale-out DSPEs in terms of processing latency and sustainable throughput by a factor of up to 15 in a distributed setting. [less ▲]

Detailed reference viewed: 153 (15 UL)
Full Text
Peer Reviewed
See detailGuided Inductive Logic Programming: Cleaning Knowledge Bases with Iterative User Feedback
Wu, Yan; Chen, Jinchuan; Haxhidauti, Plarent et al

in Guided Inductive Logic Programming: Cleaning Knowledge Bases with Iterative User Feedback (2020, March 12)

Domain-oriented knowledge bases (KBs) such as DBpedia and YAGO are largely constructed by applying a set of predefined extraction rules to the semi-structured contents of Wikipedia articles. Although both ... [more ▼]

Domain-oriented knowledge bases (KBs) such as DBpedia and YAGO are largely constructed by applying a set of predefined extraction rules to the semi-structured contents of Wikipedia articles. Although both of these large-scale KBs achieve very high average precision values (above 95% for YAGO3), subtle mistakes in a few of the underlying extraction rules may still impose a substantial amount of systematic extraction mistakes for specific relations. For example, by applying the same regular expressions to extract person names of both Asian and Western nationality, YAGO erroneously swaps most of the family and given names of Asian person entities. For traditional rule-learning approaches based on Inductive Logic Programming (ILP), it is very difficult to detect these systematic extraction mistakes, since they usually occur only in a relatively small subdomain of the relations’ arguments. In this paper, we thus propose a guided form of ILP, coined “GILP”, that iteratively asks for small amounts of user feedback over a given KB to learn a set of data-cleaning rules that (1) best match the feedback and (2) also generalize to a larger portion of facts in the KB. We propose both algorithms and respective metrics to automatically assess the quality of the learned rules with respect to the user feedback. [less ▲]

Detailed reference viewed: 118 (3 UL)
Full Text
Peer Reviewed
See detailBenchmarking Synchronous and Asynchronous Stream Processing Systems
Ellampallil Venugopal, Vinu UL; Theobald, Martin UL

in Ellampallil Venugopal, Vinu; Theobald, Martin (Eds.) Benchmarking Synchronous and Asynchronous Stream Processing Systems (2020, January 02)

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous ... [more ▼]

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous progress in distributed stream processing systems in the past few years, the high-throughput and low-latency (a.k.a. high sustainable-throughput) requirement of modern applications is pushing the limits of traditional data processing infrastructures. To understand the upper bound of the maximum sustainable throughput that is possible for a given node configuration, we have designed multiple hard-coded multi-threaded processes (called ad-hoc dataflows) in C++ using Message Passing Interface (MPI) and Pthread libraries. Our preliminary results show that our ad-hoc design on average is 5.2 times better than Flink and 9.3 times better than Spark. [less ▲]

Detailed reference viewed: 307 (7 UL)
See detailScalable Uncertainty Management - 13th International Conference (SUM 2019), Compiegne, France, December 16-18, 2019, Proceedings
Ben Amor, Nahla; Quost, Benjamin; Theobald, Martin UL

Book published by Springer (2019)

Detailed reference viewed: 93 (4 UL)
Full Text
Peer Reviewed
See detailOuter and Anti Joins in Temporal-Probabilistic Databases
Papaioannou, Katerina; Theobald, Martin UL; Böhlen, Michael H.

in 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019 (2019, October 16)

Detailed reference viewed: 129 (2 UL)
Full Text
See detailLineage-Aware Temporal Windows: Supporting Set Operations in Temporal-Probabilistic Databases
Papaioannou, Katerina; Theobald, Martin UL; Böhlen, Michael H.

in CoRR (2019), abs/1910.00474

Detailed reference viewed: 99 (2 UL)
Full Text
Peer Reviewed
See detailAnytime Approximation in Probabilistic Databases via Scaled Dissociations
Van den Heuvel, Maarten; Ivanov, Peter; Gatterbauer, Wolfgang et al

in Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019 (2019, June 22)

Detailed reference viewed: 95 (1 UL)