Resources


Dataset – Users from Twitter

971 user’s id extracted from UDI Dataset.

Download


Pork Standards Meat Modeling

A base dados contém amostras de carne suína com atributos de pH, L*, Capacidade de Retenção de Água (drip loss). Os nomes dos atributos precisam ser “pHf”, “L” e “WHC”.

Características dos atributos: valores reais
Tarefas associadas: classificação
Como utilizar os modelos de classificação: Antes de executar o código, são necessários os seguintes passos:
Passo 1: colocar o arquivo da base de dados na pasta do projeto
Passo 2: inserir o nome do arquivo entre aspas na variável “file.name”
Passo 3: inserir o tipo do arquivo (extensão do arquivo, por exemplo: csv, xls, ods) entre aspas na variável “file.type”

Depois da execução do código, serão gerados novos arquivos com as amostras rotuladas. Esses arquivos serão nomeados com o nome do dataset seguido do nome do método de classificação.

The dataset contains pork samples with pH, L* and Water Hold Capacity (drip loss) attributes. The column names need to be “pHf”, “L” e “WHC”

Attribute characteristics: real-life
Associated tasks: classification
How to use the classification models:
Before executing the code the following steps are necessary:
Step 1: put the dataset file in the project folder
Step 2: insert the file name in quotes in the variable “file.name”
Step 3: insert the file type (extension, e.g. csv, xls, ods) in quotes in the variable “file.type”
After the code execution, new files with the labeled samples will be generated. These files will be named with the dataset name followed by the classification method name.

Barbin, Douglas, et al. “Near-infrared hyperspectral imaging for grading and classification of pork.” Meat Science 90.1 (2012): 259-268.
Faucitano, L., et al. “Shelf life of pork from five different quality classes.”Meat science 84.3 (2010): 466-469.
Joo, S. T., et al. “Objectively Predicting Ultimate Quality of Post-Rigor Pork Musculature.” Asian-Australasian Journal of Animal Sciences 13.1 (2000): 77-85.
Kauffman, R. G., et al. “The effectiveness of examining early post-mortem musculature to predict ultimate pork quality.” Meat Science 34.3 (1993): 283-300.
Warner, R. D., R. G. Kauffman, and M. L. Greaser. “Muscle protein changes post mortem in relation to pork quality traits.” Meat science 45.3 (1997): 339-352.

Download


Dataset – Users from Twitter and YouTube

Download


Dataset – Event Log

Download


Multi-Target Framework (R Language)

Download

S. M. Mastelini, E. J. Santana, R. Cerri and S. Barbon, “DSTARS: A Multi-target Deep Structure for Tracking Asynchronous Regressor Stack,” 2017 Brazilian Conference on Intelligent Systems (BRACIS), Uberlandia, 2017, pp. 19-24.
doi: 10.1109/BRACIS.2017.30

Multi-Target Dataset Generator (R Language)

This algorithm generates synthetic multi-target datasets based on some hyperparameters: the number of instances (N), features (m), targets (d), percentage of instances to be degraded by noise (η) and a number of generation groups (g). The nature of inter-target relations is also explicitly inputted for the generator algorithm.

Download

Mastelini, S. M., Santana, E. J., da Costa, V. G. T. and Barbon, S.. “Benchmarking multi-target regression methods”. 2018 Brazilian Conference on Intelligent Systems (BRACIS). In press.

Multi-Target Datasets 2019

These 648 synthetic datasets were generated using the framework presented in “Mastelini, Saulo Martiello, et al. “Benchmarking Multi-target Regression Methods.” 2018 7th Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 2018.”
Values of the hyperparaters:
Number of instances 500, 1000;
Number of features 15, 30, 45, 60, 75, 90;
Number of targets: 3, 6;
Generating groups: 1, 2;
Percentage of instances affected by noise: 1, 5, 10.

Download

Aguiar, G. J.,Santana, E. J., Mastelini, S. M., Mantovani, R.G. and Barbon, S.. “Towards meta-learning for multi-target regression problems”. 2019 Brazilian Conference on Intelligent Systems (BRACIS). In press.

Synthetic Events Streams 2019

This package contains 942 synthetic event streams that simulate concept drift in business processes. Each stream has only one drift. Different stream sizes, types and perspective of drift, and noise percentual are applied. Each event in the stream contains four main attributes: case identification, event name, event start time, event completion time.

Characteristics:

Drift types (A): gradual, incremental, recurring and sudden;
Drift perspectives (B): time and trace;
Noise percentage (C): 0, 5, 10, 15, 20;
Number of cases in the stream (D): 100, 500, 1000;
Change patterns (E): baseline, cb, cd, cf, cp, IOR, IRO, lp, OIR, pl, pm, re, RIO, ROI, rp, sw.

The file name follows the pattern [A]_[B]_noise[C]_[D]_[E].

Download

 


Multi-target Stacked Generalisation algorithm (R Language)

This algorithm is an implementation of Multi-target Stacked Generalisation (MTSG) in R Language. MTSG follows the Stack Generalization principle, but differently from the other multi-target regression stacking based methods, MTSG uses the original input set only in the first phase. After creating the first base-model and obtaining the predictions outputted by them, new base-models are created based only on those predictions as the input.

Download

Santana EJ, dos Santos FR, Mastelini SM, Melquiades FL, Barbon Jr S. Improved prediction of soil properties with Multi-target Stacked Generalisation on EDXRF spectra. Chemometrics and Intelligent Laboratory Systems. 2002:104231.

Implementation of Multi-label experiments for Multiple Voice Disorders in One Individual:

This package contains a R script and a jupyter notebook.

Download