References of "Meira, Jorge Augusto 50002369"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailIs Big Data Sufficient for a Reliable Detection of Non-Technical Losses?
Glauner, Patrick UL; Migliosi, Angelo UL; Meira, Jorge Augusto UL et al

in Proceedings of the 19th International Conference on Intelligent System Applications to Power Systems (ISAP 2017) (2017, September)

Non-technical losses (NTL) occur during the distribution of electricity in power grids and include, but are not limited to, electricity theft and faulty meters. In emerging countries, they may range up to ... [more ▼]

Non-technical losses (NTL) occur during the distribution of electricity in power grids and include, but are not limited to, electricity theft and faulty meters. In emerging countries, they may range up to 40% of the total electricity distributed. In order to detect NTLs, machine learning methods are used that learn irregular consumption patterns from customer data and inspection results. The Big Data paradigm followed in modern machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. However, the sample of inspected customers may be biased, i.e. it does not represent the population of all customers. As a consequence, machine learning models trained on these inspection results are biased as well and therefore lead to unreliable predictions of whether customers cause NTL or not. In machine learning, this issue is called covariate shift and has not been addressed in the literature on NTL detection yet. In this work, we present a novel framework for quantifying and visualizing covariate shift. We apply it to a commercial data set from Brazil that consists of 3.6M customers and 820K inspection results. We show that some features have a stronger covariate shift than others, making predictions less reliable. In particular, previous inspections were focused on certain neighborhoods or customer classes and that they were not sufficiently spread among the population of customers. This framework is about to be deployed in a commercial product for NTL detection. [less ▲]

Detailed reference viewed: 39 (6 UL)
Full Text
Peer Reviewed
See detailIntroduction to Detection of Non-Technical Losses using Data Analytics
Glauner, Patrick UL; Meira, Jorge Augusto UL; State, Radu UL et al

Scientific Conference (2017, September)

Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in ... [more ▼]

Electricity losses are a frequently appearing problem in power grids. Non-technical losses (NTL) appear during distribution and include, but are not limited to, the following causes: Meter tampering in order to record lower consumptions, bypassing meters by rigging lines from the power source, arranged false meter readings by bribing meter readers, faulty or broken meters, un-metered supply, technical and human errors in meter readings, data processing and billing. NTLs are also reported to range up to 40% of the total electricity distributed in countries such as Brazil, India, Malaysia or Lebanon. This is an introductory level course to discuss how to predict if a customer causes a NTL. In the last years, employing data analytics methods such as data mining and machine learning have evolved as the primary direction to solve this problem. This course will compare and contrast different approaches reported in the literature. Practical case studies on real data sets will be included. Therefore, attendees will not only understand, but rather experience the challenges of NTL detection and learn how these challenges could be solved in the coming years. [less ▲]

Detailed reference viewed: 10 (1 UL)
Full Text
Peer Reviewed
See detailA PetriNet Mechanism for OLAP in NUMA
Dominico, Simone; Almeida, Eduardo Cunha de; Meira, Jorge Augusto UL

Poster (2017, May 15)

Detailed reference viewed: 16 (3 UL)
Full Text
Peer Reviewed
See detailIdentifying Irregular Power Usage by Turning Predictions into Holographic Spatial Visualizations
Glauner, Patrick UL; Dahringer, Niklas; Puhachov, Oleksandr et al

in Proceedings of the 17th IEEE International Conference on Data Mining Workshops (ICDMW 2017) (2017)

Power grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging ... [more ▼]

Power grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging countries. Industrial NTL detection systems are still largely based on expert knowledge when deciding whether to carry out costly on-site inspections of customers. Electricity providers are reluctant to move to large-scale deployments of automated systems that learn NTL profiles from data due to the latter's propensity to suggest a large number of unnecessary inspections. In this paper, we propose a novel system that combines automated statistical decision making with expert knowledge. First, we propose a machine learning framework that classifies customers into NTL or non-NTL using a variety of features derived from the customers' consumption data. The methodology used is specifically tailored to the level of noise in the data. Second, in order to allow human experts to feed their knowledge in the decision loop, we propose a method for visualizing prediction results at various granularity levels in a spatial hologram. Our approach allows domain experts to put the classification results into the context of the data and to incorporate their knowledge for making the final decisions of which customers to inspect. This work has resulted in appreciable results on a real-world data set of 3.6M customers. Our system is being deployed in a commercial NTL detection software. [less ▲]

Detailed reference viewed: 20 (6 UL)
Full Text
Peer Reviewed
See detailThe Top 10 Topics in Machine Learning Revisited: A Quantitative Meta-Study
Glauner, Patrick UL; Du, Manxing UL; Paraschiv, Victor et al

in Proceedings of the 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2017) (2017)

Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we ... [more ▼]

Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we revisit this question from a quantitative perspective. Concretely, we collect 54K abstracts of papers published between 2007 and 2016 in leading machine learning journals and conferences. We then use machine learning in order to determine the top 10 topics in machine learning. We not only include models, but provide a holistic view across optimization, data, features, etc. This quantitative approach allows reducing the bias of surveys. It reveals new and up-to-date insights into what the 10 most prolific topics in machine learning research are. This allows researchers to identify popular topics as well as new and rising topics for their research. [less ▲]

Detailed reference viewed: 63 (9 UL)
Full Text
Peer Reviewed
See detailThe Challenge of Non-Technical Loss Detection using Artificial Intelligence: A Survey
Glauner, Patrick UL; Meira, Jorge Augusto UL; Valtchev, Petko UL et al

in International Journal of Computational Intelligence Systems (2017), 10(1), 760-775

Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science ... [more ▼]

Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science. NTLs cause significant harm to the economy, as in some countries they may range up to 40% of the total electricity distributed. The predominant research direction is employing artificial intelligence to predict whether a customer causes NTL. This paper first provides an overview of how NTLs are defined and their impact on economies, which include loss of revenue and profit of electricity providers and decrease of the stability and reliability of electrical power grids. It then surveys the state-of-the-art research efforts in a up-to-date and comprehensive review of algorithms, features and data sets used. It finally identifies the key scientific and engineering challenges in NTL detection and suggests how they could be addressed in the future. [less ▲]

Detailed reference viewed: 121 (7 UL)
Full Text
Peer Reviewed
See detailDistilling Provider-Independent Data for General Detection of Non-Technical Losses
Meira, Jorge Augusto UL; Glauner, Patrick UL; State, Radu UL et al

in Power and Energy Conference, Illinois 23-24 February 2017 (2017)

Non-technical losses (NTL) in electricity distribution are caused by different reasons, such as poor equipment maintenance, broken meters or electricity theft. NTL occurs especially but not exclusively in ... [more ▼]

Non-technical losses (NTL) in electricity distribution are caused by different reasons, such as poor equipment maintenance, broken meters or electricity theft. NTL occurs especially but not exclusively in emerging countries. Developed countries, even though usually in smaller amounts, have to deal with NTL issues as well. In these countries the estimated annual losses are up to six billion USD. These facts have directed the focus of our work to the NTL detection. Our approach is composed of two steps: 1) We compute several features and combine them in sets characterized by four criteria: temporal, locality, similarity and infrastructure. 2) We then use the sets of features to train three machine learning classifiers: random forest, logistic regression and support vector vachine. Our hypothesis is that features derived only from provider-independent data are adequate for an accurate detection of non-technical losses. [less ▲]

Detailed reference viewed: 78 (18 UL)
Full Text
Peer Reviewed
See detail“Overloaded!” — A Model-based Approach to Database Stress Testing
Meira, Jorge Augusto UL; Almeira, Eduardo Cunha de; Kim, Dongsun UL et al

in International Conference on Database and Expert Systems Applications, Porto 5-8 September 2016 (2016)

Detailed reference viewed: 29 (1 UL)
Full Text
Peer Reviewed
See detailNeighborhood Features Help Detecting Non-Technical Losses in Big Data Sets
Glauner, Patrick UL; Meira, Jorge Augusto UL; Dolberg, Lautaro et al

in Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing Applications and Technologies (BDCAT 2016) (2016)

Electricity theft occurs around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non ... [more ▼]

Electricity theft occurs around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non-technical losses (NTL), which occur during the distribution of electricity in power grids. In this paper, we build features from the neighborhood of customers. We first split the area in which the customers are located into grids of different sizes. For each grid cell we then compute the proportion of inspected customers and the proportion of NTL found among the inspected customers. We then analyze the distributions of features generated and show why they are useful to predict NTL. In addition, we compute features from the consumption time series of customers. We also use master data features of customers, such as their customer class and voltage of their connection. We compute these features for a Big Data base of 31M meter readings, 700K customers and 400K inspection results. We then use these features to train four machine learning algorithms that are particularly suitable for Big Data sets because of their parallelizable structure: logistic regression, k-nearest neighbors, linear support vector machine and random forest. Using the neighborhood features instead of only analyzing the time series has resulted in appreciable results for Big Data sets for varying NTL proportions of 1%-90%. This work can therefore be deployed to a wide range of different regions. [less ▲]

Detailed reference viewed: 64 (7 UL)
Full Text
Peer Reviewed
See detailGeneric Cloud Platform Multi-objective Optimization Leveraging Models@run.time
El Kateb, Donia UL; Fouquet, François UL; Nain, Grégory UL et al

Scientific Conference (2014, March)

Detailed reference viewed: 240 (70 UL)
Full Text
See detailMODEL-BASED STRESS TESTING FOR DATABASE SYSTEMS
Meira, Jorge Augusto UL

Doctoral thesis (2014)

Database Management Systems (DBMS) have been successful at processing transaction workloads over decades. But contemporary systems, including Cloud computing, Internet-based systems, and sensors (i.e ... [more ▼]

Database Management Systems (DBMS) have been successful at processing transaction workloads over decades. But contemporary systems, including Cloud computing, Internet-based systems, and sensors (i.e., Internet of Things (IoT)), are challenging the architecture of the DBMS with burgeoning transaction workloads. The direct consequence is that the development agenda of the DBMS is now heavily concerned about meeting non-functional requirements, such as performance, robustness and scalability. Otherwise, any stressing workload will make the DBMS lose control of simple functional requirements, such as responding to a transaction request~\cite{stem}. While traditional DBMS, including DB2, Oracle, and PostgreSQL, require embedding new features to meet non-functional requirements, the contemporary DBMS called as NewSQL present a completely new architecture. What is still lacking in the development agenda is a proper testing approach coupled with burgeoning transaction workloads for validating the DBMS with non-functional requirements in mind. The typical non-functional validation is carried out by performance benchmarks. However, they focus on metrics comparison instead of finding defects. In this thesis, we address this lack by presenting different contributions for the domain of DBMS stress testing. These contributions fit different testing objectives to challenge each specific architecture of traditional and contemporary DBMS. For instance, testing the earlier DBMS (e.g., DB2, Oracle) require incremental performance tuning (i.e., from simple setup to complex one), while testing the latter DBMS (e.g., VoltDB, NuoDB) require driving it into different performance states due to its self-tuning capabilities. Overall, this thesis makes the following contributions: 1) Stress TEsting Methodology (STEM): A methodology to capture performance degradation and expose system defects in the internal code due to the combination of a stress workload and mistuning; 2) Model-based Database Stress Testing (MoDaST): An approach to test NewSQL database systems. Supported by a Database State Machine (DSM), MoDaST infers internal states of the database based on performance observations under different workload levels; 3) Under Pressure Benchmark (UPB): A benchmark to assess the impact of availability mechanisms in NewSQL database systems. We validate our contributions with several popular DBMS. Among the outcomes, we highlight that our methodologies succeed in driving the DBMS up to stress state conditions and expose several related defects, including a new major defect in a popular NewSQL. [less ▲]

Detailed reference viewed: 85 (18 UL)
Full Text
Peer Reviewed
See detailA state machine for database non-functional testing
Meira, Jorge Augusto UL; Almeida, Eduardo; Le Traon, Yves UL

Poster (2014)

Over the last decade, large amounts of concurrent transactions have been generated from different sources, such as, Internet-based systems, mobile applications, smart- homes and cars. High-throughput ... [more ▼]

Over the last decade, large amounts of concurrent transactions have been generated from different sources, such as, Internet-based systems, mobile applications, smart- homes and cars. High-throughput transaction processing is becoming commonplace, however there is no testing technique for validating non functional aspects of DBMS under transaction flooding workloads. In this paper we propose a database state machine to represent the states of DBMS when processing concurrent trans- actions. The state transitions are forced by increasing concurrency of the testing workload. Preliminary results show the effectiveness of our approach to drive the system among different performance states and to find related defects. [less ▲]

Detailed reference viewed: 42 (2 UL)
Full Text
Peer Reviewed
See detailUnder Pressure Benchmark for DDBMS Availability
Fior, Alessandro Gustavo; Meira, Jorge Augusto UL; Almeida, Eduardo Cunha de et al

in Journal of Information and Data Management (2013)

The availability of Distributed Database Management Systems (DDBMS) is related to the probability of being up and running at a given point in time and to the management of failures. One well-known and ... [more ▼]

The availability of Distributed Database Management Systems (DDBMS) is related to the probability of being up and running at a given point in time and to the management of failures. One well-known and widely used mechanism to ensure availability is replication, which includes performance impact on maintaining data replicas across the DDBMS’s machine nodes. Benchmarking can be used to measure such impact. In this article, we present a benchmark that evaluates the performance of DDBMS, considering availability through replication, called Under Pressure Benchmark (UPB). The UPB measures performance with different degrees of replication upon a high-throughput distributed workload, combined with failures. The UPB methodology increases the evaluation complexity from a stable system scenario to a complex one with different load sizes and replicas. We validate our benchmark with three high-throughput in-memory DDBMS: VoltDB, NuoDB and Dbms-X. [less ▲]

Detailed reference viewed: 168 (5 UL)
Full Text
Peer Reviewed
See detailStress Testing of Transactional Database Systems
Meira, Jorge Augusto UL; Almeida, Eduardo Cunha de; Sunyé, Gerson et al

in Journal of Information and Data Management (2013)

Transactional database management systems (DBMS) have been successful at supporting traditional transaction processing workloads. However, web-based applications that tend to generate huge numbers of ... [more ▼]

Transactional database management systems (DBMS) have been successful at supporting traditional transaction processing workloads. However, web-based applications that tend to generate huge numbers of concurrent business operations are pushing DBMS performance over their limits, thus threatening overall system availability. Then, a crucial question is how to test DBMS performance under heavy workload conditions. Answering this question requires a testing methodology to set up non-biased conditions for pushing a particular DBMS over its normal performance limits (i.e., to stress it). In this article, we present a stress testing methodology for DBMS to search for defects in supporting very heavy workloads. Our methodology leverages distributed testing techniques and takes into account the various biases that may affect the test results. It progressively increases the workload along with several tuning steps up to a stress condition. We validate our methodology with empirical studies on two popular DBMS (one proprietary, one open-source) and detail the defects that have been found. [less ▲]

Detailed reference viewed: 69 (10 UL)
Full Text
Peer Reviewed
See detailPeer-to-Peer Load Testing
Meira, Jorge Augusto UL; Almeida, Eduardo Cunha; Le Traon, Yves UL et al

in Software Testing, Verification and Validation (ICST), 2012 IEEE Fifth International Conference on (2012)

Nowadays the large-scale systems are common-place in any kind of applications. The popularity of the web created a new environment in which the applications need to be highly scalable due to the data ... [more ▼]

Nowadays the large-scale systems are common-place in any kind of applications. The popularity of the web created a new environment in which the applications need to be highly scalable due to the data tsunami generated by a huge load of requests (i.e., connections and business operations). In this context, the main question is to validate how far the web applications can deal with the load generated by the clients. Load testing is a technique to analyze the behavior of the system under test upon normal and heavy load conditions. In this work we present a peer-to-peer load testing approach to isolate bottleneck problems related to centralized testing drivers and to scale up the load. Our approach was tested in a DBMS as study case and presents satisfactory results. [less ▲]

Detailed reference viewed: 13 (0 UL)