References of "State, Radu 50003137"
     in
Bookmark and Share    
Full Text
See detailNon-Negative Paratuck2 Tensor Decomposition Combined to LSTM Network for Smart Contracts Profiling
Charlier, Jérémy Henri J. UL; State, Radu UL

in International Journal of Computer & Software Engineering (2018), 3(1),

Background: Past few months have seen the rise of blockchain and cryptocurrencies. In this context, the Ethereum platform, an open-source blockchain-based platform using Ether cryptocurrency, has been ... [more ▼]

Background: Past few months have seen the rise of blockchain and cryptocurrencies. In this context, the Ethereum platform, an open-source blockchain-based platform using Ether cryptocurrency, has been designed to use smart contracts programs. These are self-executing blockchain contracts. Due to their high volume of transactions, analyzing their behavior is very challenging. We address this challenge in our paper. Methods: We develop for this purpose an innovative approach based on the non-negative tensor decomposition Paratuck2 combined with long short-term memory. The objective is to assess if predictive analysis can forecast smart contracts activities over time. Three statistical tests are performed on the predictive analytics, the mean absolute percentage error, the mean directional accuracy and the Jaccard distance. Results: Among dozens of GB of transactions, the Paratuck2 tensor decomposition allows asymmetric modeling of the smart contracts. Furthermore, it highlights time dependent latent groups. The latent activities are modeled by the long short term memory network for predictive analytics. The highly accurate predictions underline the accuracy of the method and show that blockchain activities are not pure randomness. Conclusion: Herein, we are able to detect the most active contracts, and predict their behavior. In the context of future regulations, our approach opens new perspective for monitoring blockchain activities. [less ▲]

Detailed reference viewed: 47 (5 UL)
Full Text
Peer Reviewed
See detailNon-Negative Paratuck2 Tensor Decomposition Combined to LSTM Network For Smart Contracts Profiling
Charlier, Jérémy Henri J. UL; State, Radu UL; Hilger, Jean

in Charlier, Jeremy; State, Radu; Hilger, Jean (Eds.) 2018 IEEE International Conference on Big Data and Smart Computing Proceedings (2018, January)

Smart contracts are programs stored and executed on a blockchain. The Ethereum platform, an open source blockchain-based platform, has been designed to use these programs offering secured protocols and ... [more ▼]

Smart contracts are programs stored and executed on a blockchain. The Ethereum platform, an open source blockchain-based platform, has been designed to use these programs offering secured protocols and transaction costs reduction. The Ethereum Virtual Machine performs smart contracts runs, where the execution of each contract is limited to the amount of gas required to execute the operations described in the code. Each gas unit must be paid using Ether, the crypto-currency of the platform. Due to smart contracts interactions evolving over time, analyzing the behavior of smart contracts is very challenging. We address this challenge in our paper. We develop for this purpose an innovative approach based on the nonnegative tensor decomposition PARATUCK2 combined with long short-term memory (LSTM) to assess if predictive analysis can forecast smart contracts interactions over time. To validate our methodology, we report results for two use cases. The main use case is related to analyzing smart contracts and allows shedding some light into the complex interactions among smart contracts. In order to show the generality of our method on other use cases, we also report its performance on video on demand recommendation. [less ▲]

Detailed reference viewed: 98 (17 UL)
Full Text
Peer Reviewed
See detailA Blockchain-Based PKI Management Framework
Yakubov, Alexander UL; Shbair, Wazen UL; Wallbom, Anders et al

in The First IEEE/IFIP International Workshop on Managing and Managed by Blockchain (Man2Block) colocated with IEEE/IFIP NOMS 2018, Tapei, Tawain 23-27 April 2018 (2018)

Public-Key Infrastructure (PKI) is the cornerstone technology that facilitates secure information exchange over the Internet. However, PKI is exposed to risks due to potential failures of Certificate ... [more ▼]

Public-Key Infrastructure (PKI) is the cornerstone technology that facilitates secure information exchange over the Internet. However, PKI is exposed to risks due to potential failures of Certificate Authorities (CAs) that may be used to issue unauthorized certificates for end-users. Many recent breaches show that if a CA is compromised, the security of the corresponding end-users will be in risk. As an emerging solution, Blockchain technology potentially resolves the problems of traditional PKI systems - in particular, elimination of single point-of-failure and rapid reaction to CAs shortcomings. Blockchain has the ability to store and manage digital certificates within a public and immutable ledger, resulting in a fully traceable history log. In this paper we designed and developed a blockchain-based PKI management framework for issuing, validating and revoking X.509 certificates. Evaluation and experimental results confirm that the proposed framework provides more reliable and robust PKI systems with modest maintenance costs. [less ▲]

Detailed reference viewed: 146 (1 UL)
Full Text
Peer Reviewed
See detailDetecting Malicious Authentication Events Trustfully
Kaiafas, Georgios UL; Varisteas, Georgios UL; Lagraa, Sofiane UL et al

in Kaiafas, Georgios; Varisteas, Georgios; Lagraa, Sofiane (Eds.) et al IEEE/IFIP Network Operations and Management Symposium, 23-27 April 2018, Taipei, Taiwan Cognitive Management in a Cyber World (2018)

Detailed reference viewed: 22 (2 UL)
Full Text
Peer Reviewed
See detailOn the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage
Glauner, Patrick UL; State, Radu UL; Valtchev, Petko et al

in Proceedings 13th International FLINS Conference on Data Science and Knowledge Engineering for Sensing Decision Support (FLINS 2018) (2018)

In machine learning, a bias occurs whenever training sets are not representative for the test data, which results in unreliable models. The most common biases in data are arguably class imbalance and ... [more ▼]

In machine learning, a bias occurs whenever training sets are not representative for the test data, which results in unreliable models. The most common biases in data are arguably class imbalance and covariate shift. In this work, we aim to shed light on this topic in order to increase the overall attention to this issue in the field of machine learning. We propose a scalable novel framework for reducing multiple biases in high-dimensional data sets in order to train more reliable predictors. We apply our methodology to the detection of irregular power usage from real, noisy industrial data. In emerging markets, irregular power usage, and electricity theft in particular, may range up to 40% of the total electricity distributed. Biased data sets are of particular issue in this domain. We show that reducing these biases increases the accuracy of the trained predictors. Our models have the potential to generate significant economic value in a real world application, as they are being deployed in a commercial software for the detection of irregular power usage. [less ▲]

Detailed reference viewed: 24 (2 UL)
Full Text
Peer Reviewed
See detailMachine Learning for Data-Driven Smart Grid Applications
Glauner, Patrick UL; Meira, Jorge Augusto UL; State, Radu UL

Scientific Conference (2018)

The field of Machine Learning grew out of the quest for artificial intelligence. It gives computers the ability to learn statistical patterns from data without being explicitly programmed. These patterns ... [more ▼]

The field of Machine Learning grew out of the quest for artificial intelligence. It gives computers the ability to learn statistical patterns from data without being explicitly programmed. These patterns can then be applied to new data in order to make predictions. Machine Learning also allows to automatically adapt to changes in the data without amending the underlying model. We deal every day dozens of times with Machine Learning applications such as when doing a Google search, using spam filters, face detection, speaking to voice recognition software or when sitting in a self-driving car. In recent years, machine learning methods have evolved in the smart grid community. This change towards analyzing data rather than modeling specific problems has lead to adaptable, more generic methods, that require less expert knowledge and that are easier to deploy in a number of use cases. This is an introductory level course to discuss what machine learning is and how to apply it to data-driven smart grid applications. Practical case studies on real data sets, such as load forecasting, detection of irregular power usage and visualization of customer data, will be included. Therefore, attendees will not only understand, but rather experience, how to apply machine learning methods to smart grid data. [less ▲]

Detailed reference viewed: 32 (1 UL)
Full Text
Peer Reviewed
See detailImpact of Biases in Big Data
Glauner, Patrick UL; Valtchev, Petko; State, Radu UL

in Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2018) (2018)

The underlying paradigm of big data-driven machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. Is ... [more ▼]

The underlying paradigm of big data-driven machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. Is having simply more data always helpful? In 1936, The Literary Digest collected 2.3M filled in questionnaires to predict the outcome of that year's US presidential election. The outcome of this big data prediction proved to be entirely wrong, whereas George Gallup only needed 3K handpicked people to make an accurate prediction. Generally, biases occur in machine learning whenever the distributions of training set and test set are different. In this work, we provide a review of different sorts of biases in (big) data sets in machine learning. We provide definitions and discussions of the most commonly appearing biases in machine learning: class imbalance and covariate shift. We also show how these biases can be quantified and corrected. This work is an introductory text for both researchers and practitioners to become more aware of this topic and thus to derive more reliable models for their learning problems. [less ▲]

Detailed reference viewed: 58 (9 UL)
Full Text
Peer Reviewed
See detailBlockchain Orchestration and Experimentation Framework: A Case Study of KYC
Shbair, Wazen UL; Steichen, Mathis Georges UL; François, Jérôme UL et al

in The First IEEE/IFIP International Workshop on Managing and Managed by Blockchain (Man2Block) colocated with IEEE/IFIP NOMS 2018 (2018)

Conducting experiments to evaluate blockchain applications is a challenging task for developers, because there is a range of configuration parameters that control blockchain environments. Many public ... [more ▼]

Conducting experiments to evaluate blockchain applications is a challenging task for developers, because there is a range of configuration parameters that control blockchain environments. Many public testnets (e.g. Rinkeby Ethereum) can be used for testing, however, we cannot adjust their parameters (e.g. Gas limit, Mining difficulty) to further the understanding of the application in question and of the employed blockchain. This paper proposes an easy to use orchestration framework over the Grid'5000 platform. Grid'5000 is a highly reconfigurable and controllable large-scale testbed. We developed a tool that facilitates nodes reservation, deployment and blockchain configuration over the Grid'5000 platform. In addition, our tool can fine-tune blockchain and network parameters before and between experiments. The proposed framework offers insights for private and consortium blockchain developers to identify performance bottlenecks and to assess the behavior of their applications in different circumstances. [less ▲]

Detailed reference viewed: 97 (2 UL)
Full Text
Peer Reviewed
See detailDetection of Irregular Power Usage using Machine Learning
Glauner, Patrick UL; Meira, Jorge Augusto UL; State, Radu UL

Scientific Conference (2018)

Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in ... [more ▼]

Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in order to record lower consumptions, bypassing meters by rigging lines from the power source, arranged false meter readings by bribing meter readers, faulty or broken meters, un-metered supply, technical and human errors in meter readings, data processing and billing. NTLs are also reported to range up to 40% of the total electricity distributed in countries such as India, Pakistan, Malaysia, Brazil or Lebanon. This is an introductory level course to discuss how to predict if a customer causes a NTL. In the last years, employing data analytics methods such as machine learning and data mining have evolved as the primary direction to solve this problem. This course will present and compare different approaches reported in the literature. Practical case studies on real data sets will be included. As an additional outcome, attendees will understand the open challenges of NTL detection and learn how these challenges could be solved in the coming years. [less ▲]

Detailed reference viewed: 17 (2 UL)
Full Text
Peer Reviewed
See detailVSOC - A Virtual Security Operating Center
Falk, Eric UL; Fiz Pontiveros, Beltran UL; Repcek, Stefan et al

in Global Communications (2017)

Security in virtualised environments is becoming increasingly important for institutions, not only for a firm’s own on-site servers and network but also for data and sites that are hosted in the cloud ... [more ▼]

Security in virtualised environments is becoming increasingly important for institutions, not only for a firm’s own on-site servers and network but also for data and sites that are hosted in the cloud. Today, security is either handled globally by the cloud provider, or each customer needs to invest in its own security infrastructure. This paper proposes a Virtual Security Operation Center (VSOC) that allows to collect, analyse and visualize security related data from multiple sources. For instance, a user can forward log data from its firewalls, applications and routers in order to check for anomalies and other suspicious activities. The security analytics provided by the VSOC are comparable to those of commercial security incident and event management (SIEM) solutions, but are deployed as a cloud-based solution with the additional benefit of using big data processing tools to handle large volumes of data. This allows us to detect more complex attacks that cannot be detected with todays signature-based (i.e. rules) SIEM solutions. [less ▲]

Detailed reference viewed: 58 (7 UL)
Full Text
Peer Reviewed
See detailOptimising Packet Forwarding in Multi-Tenant Networks using Rule Compilation
Hommes, Stefan UL; Valtchev, Petko; Blaiech, Khalil et al

in Optimising Packet Forwarding in Multi-Tenant Networks using Rule Compilation (2017, November)

Packet forwarding in Software-Defined Networks (SDN) relies on a centralised network controller which enforces network policies expressed as forwarding rules. Rules are deployed as sets of entries into ... [more ▼]

Packet forwarding in Software-Defined Networks (SDN) relies on a centralised network controller which enforces network policies expressed as forwarding rules. Rules are deployed as sets of entries into network device tables. With heterogeneous devices, deployment is strongly bounded by the respective table constraints (size, lookup time, etc.) and forwarding pipelines. Hence, minimising the overall number of entries is paramount in reducing resource consumption and speeding up the search. Moreover, since multiple control plane applications can deploy own rules, conflicts may occur. To avoid those and ensure overall correctness, a rule validation mechanism is required. Here, we present a compilation mechanism for rules of diverging origins that minimises the number of entries. Since it exploits the semantics of rules and entries, our compiler fits a heterogeneous landscape of network devices. We evaluated compiler implementations on both software and hardware switches using a realistic testbed. Experimental results show a reduction in both produced table entries and forwarding delay. [less ▲]

Detailed reference viewed: 33 (4 UL)
Full Text
Peer Reviewed
See detailModeling Smart Contracts Activities: A Tensor based Approach
Charlier, Jérémy Henri J. UL; State, Radu UL; Hilger, Jean

in Charlier, Jeremy; State, Radu; Hilger, Jean (Eds.) Proceedings of 2017 Future Technologies Conference (FTC), 29-30 November 2017, Vancouver, Canada (2017, November)

Smart contracts are autonomous software executing predefined conditions. Two of the biggest advantages of the smart contracts are secured protocols and transaction costs reduction. On the Ethereum ... [more ▼]

Smart contracts are autonomous software executing predefined conditions. Two of the biggest advantages of the smart contracts are secured protocols and transaction costs reduction. On the Ethereum platform, an open-source blockchainbased platform, smart contracts implement a distributed virtual machine on the distributed ledger. To avoid denial of service attacks and monetize the services, payment transactions are executed whenever code is being executed between contracts. It is thus natural to investigate if predictive analysis is capable to forecast these interactions. We have addressed this issue and proposed an innovative application of the tensor decomposition CANDECOMP/PARAFAC to the temporal link prediction of smart contracts. We introduce a new approach leveraging stochastic processes for series predictions based on the tensor decomposition that can be used for smart contracts predictive analytics. [less ▲]

Detailed reference viewed: 83 (15 UL)
Full Text
Peer Reviewed
See detailYour Moves, Your Device: Establishing Behavior Profiles Using Tensors
Falk, Eric UL; Charlier, Jérémy Henri J. UL; State, Radu UL

in Advanced Data Mining and Applications - 13th International Conference, ADMA 2017 (2017, November)

Smartphones became a person's constant companion. As the strictly personal devices they are, they gradually enable the replacement of well established activities as for instance payments, two factor ... [more ▼]

Smartphones became a person's constant companion. As the strictly personal devices they are, they gradually enable the replacement of well established activities as for instance payments, two factor authentication or personal assistants. In addition, Internet of Things (IoT) gadgets extend the capabilities of the latter even further. Devices such as body worn fitness trackers allow users to keep track of daily activities by periodically synchronizing data with the smartphone and ultimately with the vendor's computational centers in the cloud. These fitness trackers are equipped with an array of sensors to measure the movements of the device, to derive information as step counts or make assessments about sleep quality. We capture the raw sensor data from wrist-worn activity trackers to model a biometric behavior profile of the carrier. We establish and present techniques to determine rather the original person, who trained the model, is currently wearing the bracelet or another individual. Our contribution is based on CANDECOMP/PARAFAC (CP) tensor decomposition so that computational complexity facilitates: the execution on light computational devices on low precision settings, or the migration to stronger CPUs or to the cloud, for high to very high granularity. This precision parameter allows the security layer to be adaptable, in order to be compliant with the requirements set by the use cases. We show that our approach identifies users with high confidence. [less ▲]

Detailed reference viewed: 63 (15 UL)
Full Text
Peer Reviewed
See detailAdvanced Interest Flooding Attacks in Named-Data Networking
Signorello, Salvatore UL; Marchal, Samuel; François, Jérôme et al

Scientific Conference (2017, October 30)

The Named-Data Networking (NDN) has emerged as a clean-slate Internet proposal on the wave of Information-Centric Networking. Although the NDN’s data-plane seems to offer many advantages, e.g., native ... [more ▼]

The Named-Data Networking (NDN) has emerged as a clean-slate Internet proposal on the wave of Information-Centric Networking. Although the NDN’s data-plane seems to offer many advantages, e.g., native support for multicast communications and flow balance, it also makes the network infrastructure vulnerable to a specific DDoS attack, the Interest Flooding Attack (IFA). In IFAs, a botnet issuing unsatisfiable content requests can be set up effortlessly to exhaust routers’ resources and cause a severe performance drop to legitimate users. So far several countermeasures have addressed this security threat, however, their efficacy was proved by means of simplistic assumptions on the attack model. Therefore, we propose a more complete attack model and design an advanced IFA. We show the efficiency of our novel attack scheme by extensively assessing some of the state-of-the-art countermeasures. Further, we release the software to perform this attack as open source tool to help design future more robust defense mechanisms. [less ▲]

Detailed reference viewed: 98 (12 UL)
Peer Reviewed
See detailReliable Machine Learning for Networking: Key Concerns and Approaches
Hammerschmidt, Christian UL; Garcia, Sebastian; Verwer, Sicco et al

Poster (2017, October)

Machine learning has become one of the go-to methods for solving problems in the field of networking. This development is driven by data availability in large-scale networks and the commodification of ... [more ▼]

Machine learning has become one of the go-to methods for solving problems in the field of networking. This development is driven by data availability in large-scale networks and the commodification of machine learning frameworks. While this makes it easier for researchers to implement and deploy machine learning solutions on networks quickly, there are a number of vital factors to account for when using machine learning as an approach to a problem in networking and translate testing performance to real networks deployments successfully. This paper, rather than presenting a particular technical result, discusses the necessary considerations to obtain good results when using machine learning to analyze network-related data. [less ▲]

Detailed reference viewed: 66 (0 UL)
Full Text
Peer Reviewed
See detailProfiling Smart Contracts Interactions Tensor Decomposition and Graph Mining.
Charlier, Jérémy Henri J. UL; Lagraa, Sofiane UL; State, Radu UL et al

in Proceedings of the Second Workshop on MIning DAta for financial applicationS (MIDAS 2017) co-located with the 2017 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2017), Skopje, Macedonia, September 18, 2017. (2017, September)

Smart contracts, computer protocols designed for autonomous execution on predefined conditions, arise from the evolution of the Bitcoin’s crypto-currency. They provide higher transaction security and ... [more ▼]

Smart contracts, computer protocols designed for autonomous execution on predefined conditions, arise from the evolution of the Bitcoin’s crypto-currency. They provide higher transaction security and allow economy of scale through the automated process. Smart contracts provides inherent benefits for financial institutions such as investment banking, retail banking, and insurance. This technology is widely used within Ethereum, an open source block-chain platform, from which the data has been extracted to conduct the experiments. In this work, we propose an multi-dimensional approach to find and predict smart contracts interactions only based on their crypto-currency exchanges. This approach relies on tensor modeling combined with stochastic processes. It underlines actual exchanges between smart contracts and targets the predictions of future interactions among the community. The tensor analysis is also challenged with the latest graph algorithms to assess its strengths and weaknesses in comparison to a more standard approach. [less ▲]

Detailed reference viewed: 63 (13 UL)
Full Text
Peer Reviewed
See detailIntroduction to Detection of Non-Technical Losses using Data Analytics
Glauner, Patrick UL; Meira, Jorge Augusto UL; State, Radu UL et al

Scientific Conference (2017, September)

Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in ... [more ▼]

Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in order to record lower consumptions, bypassing meters by rigging lines from the power source, arranged false meter readings by bribing meter readers, faulty or broken meters, un-metered supply, technical and human errors in meter readings, data processing and billing. NTLs are also reported to range up to 40% of the total electricity distributed in countries such as Brazil, India, Malaysia or Lebanon. This is an introductory level course to discuss how to predict if a customer causes a NTL. In the last years, employing data analytics methods such as data mining and machine learning have evolved as the primary direction to solve this problem. This course will compare and contrast different approaches reported in the literature. Practical case studies on real data sets will be included. Therefore, attendees will not only understand, but rather experience the challenges of NTL detection and learn how these challenges could be solved in the coming years. [less ▲]

Detailed reference viewed: 92 (8 UL)
Full Text
Peer Reviewed
See detailIs Big Data Sufficient for a Reliable Detection of Non-Technical Losses?
Glauner, Patrick UL; Migliosi, Angelo UL; Meira, Jorge Augusto UL et al

in Proceedings of the 19th International Conference on Intelligent System Applications to Power Systems (ISAP 2017) (2017, September)

Non-technical losses (NTL) occur during the distribution of electricity in power grids and include, but are not limited to, electricity theft and faulty meters. In emerging countries, they may range up to ... [more ▼]

Non-technical losses (NTL) occur during the distribution of electricity in power grids and include, but are not limited to, electricity theft and faulty meters. In emerging countries, they may range up to 40% of the total electricity distributed. In order to detect NTLs, machine learning methods are used that learn irregular consumption patterns from customer data and inspection results. The Big Data paradigm followed in modern machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. However, the sample of inspected customers may be biased, i.e. it does not represent the population of all customers. As a consequence, machine learning models trained on these inspection results are biased as well and therefore lead to unreliable predictions of whether customers cause NTL or not. In machine learning, this issue is called covariate shift and has not been addressed in the literature on NTL detection yet. In this work, we present a novel framework for quantifying and visualizing covariate shift. We apply it to a commercial data set from Brazil that consists of 3.6M customers and 820K inspection results. We show that some features have a stronger covariate shift than others, making predictions less reliable. In particular, previous inspections were focused on certain neighborhoods or customer classes and that they were not sufficiently spread among the population of customers. This framework is about to be deployed in a commercial product for NTL detection. [less ▲]

Detailed reference viewed: 66 (9 UL)
Full Text
Peer Reviewed
See detailQuery-able Kafka: An agile data analytics pipeline for mobile wireless networks
Falk, Eric UL; Gurbani, Vijay K.; State, Radu UL

in Proceedings of the 43rd International Conference on Very Large Data Bases 2017 (2017, August), 10

Due to their promise of delivering real-time network insights, today's streaming analytics platforms are increasingly being used in the communications networks where the impact of the insights go beyond ... [more ▼]

Due to their promise of delivering real-time network insights, today's streaming analytics platforms are increasingly being used in the communications networks where the impact of the insights go beyond sentiment and trend analysis to include real-time detection of security attacks and prediction of network state (i.e., is the network transitioning towards an outage). Current streaming analytics platforms operate under the assumption that arriving traffic is to the order of kilobytes produced at very high frequencies. However, communications networks, especially the telecommunication networks, challenge this assumption because some of the arriving traffic in these networks is to the order of gigabytes, but produced at medium to low velocities. Furthermore, these large datasets may need to be ingested in their entirety to render network insights in real-time. Our interest is to subject today's streaming analytics platforms --- constructed from state-of-the art software components (Kafka, Spark, HDFS, ElasticSearch) --- to traffic densities observed in such communications networks. We find that filtering on such large datasets is best done in a common upstream point instead of being pushed to, and repeated, in downstream components. To demonstrate the advantages of such an approach, we modify Apache Kafka to perform limited \emph{native} data transformation and filtering, relieving the downstream Spark application from doing this. Our approach outperforms four prevalent analytics pipeline architectures with negligible overhead compared to standard Kafka. [less ▲]

Detailed reference viewed: 43 (6 UL)
Full Text
Peer Reviewed
See detailHuman in the Loop: Interactive Passive Automata Learning via Evidence-Driven State-Merging Algorithms
Hammerschmidt, Christian UL; State, Radu UL; Verwer, Sicco

Poster (2017, August)

We present an interactive version of an evidence-driven state-merging (EDSM) algorithm for learning variants of finite state automata. Learning these automata often amounts to recovering or reverse ... [more ▼]

We present an interactive version of an evidence-driven state-merging (EDSM) algorithm for learning variants of finite state automata. Learning these automata often amounts to recovering or reverse engineering the model generating the data despite noisy, incomplete, or imperfectly sampled data sources rather than optimizing a purely numeric target function. Domain expertise and human knowledge about the target domain can guide this process, and typically is captured in parameter settings. Often, domain expertise is subconscious and not expressed explicitly. Directly interacting with the learning algorithm makes it easier to utilize this knowledge effectively. [less ▲]

Detailed reference viewed: 23 (1 UL)