copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Replication Data for: Evaluating Non-Parametric Methods in Information Theory

M. Alvarez Chaves, H. Gupta, U. Ehret, and A. Guthke. Software, (2024)Related to: Evaluating Density- and Nearest Neighbor-based Methods to Accurately Estimate Information-Theoretic Quantities from Multi-Dimensional Sample Data.
DOI: 10.18419/darus-4087

Abstract

Non-Parametric Estimation in Information Theory1. Introduction: This is a repository for our paper on: "Evaluating Density- and Nearest Neighbor-based Methods to Accurately Estimate Information-Theoretic Quantities from Multi-Dimensional Sample Data". Installation: Code was written in Python 3.11.5 but should be compatible with later and earlier versions of Python down to Python 3.6. Check the requirements.txt file for any dependency issues. Usage is recommended by cloning the repository to a local directory and setting up the required environment using venv and pip: python -m venv .venv, source .venv/Scripts/activate, pip install -r requirements.txt3. Generating Data: Initially data is generated and stored in the data_evaluation/data directory using the script in the data_generation/ directory. The data for the experiments is stored as an HDF5 database. From the root directory: python data_generation/data_generation.py. Note: as the data.hdf5 file is ~123 GB, it is recommended to be locally generated. This process takes about ~12 hrs in an Intel Xeon E5-26280 v2 but shouldn't vary too much in any modern CPU. 4. Conducting an Evaluation: The scripts in the directory data_evaluation/ are used to read the data and perform the experiments. Results are stored in the results/ directory.Again, from the root directory: python data_evaluation/eval_bin_entropy.py. All of the names of the scripts have the format eval_estimator_quantity.py. In total, 12 scripts must be run, tree for each estimator: binning, KDE, numerical integration of KDE and k-NN. The notebooks/ directory serves as an archive of the development of the workflow to test each estimator. The contents of each notebook are generally the same as the code in the scripts. Log files describe the history of the project. 5. Visualizing Results: The analysis_results directory contains a notebook to create the plots used in the paper, as well as a script to read the log files and calculate the time per iteration of the different experiments.The plots are generated using the results from the data_evaluation/results directory. Results are read from .hdf5 files. All results produced using the UNITE Toolbox.

Links and resources

BibTeX key: alvarezchaves2024replication
entry type: misc
year: 2024
howpublished: Software
affiliation: Alvarez Chaves, Manuel/Universität Stuttgart, Gupta, Hoshin/The University of Arizona, Ehret, Uwe/Karlsruhe Institute of Technology, Guthke, Anneli/Universität Stuttgart
orcid-numbers: Alvarez Chaves, Manuel/0009-0002-8990-3785, Gupta, Hoshin/0000-0001-9855-2839, Ehret, Uwe/0000-0003-3454-8755, Guthke, Anneli/0000-0003-2901-1603
DOI: 10.18419/darus-4087
note: Related to: Evaluating Density- and Nearest Neighbor-based Methods to Accurately Estimate Information-Theoretic Quantities from Multi-Dimensional Sample Data

Cite this publication

%0 Generic %1 alvarezchaves2024replication %A Alvarez Chaves, Manuel %A Gupta, Hoshin %A Ehret, Uwe %A Guthke, Anneli %D 2024 %K darus ubs_10021 ubs_20019 ubs_30165 unibibliografie %R 10.18419/darus-4087 %T Replication Data for: Evaluating Non-Parametric Methods in Information Theory %X Non-Parametric Estimation in Information Theory1. Introduction: This is a repository for our paper on: "Evaluating Density- and Nearest Neighbor-based Methods to Accurately Estimate Information-Theoretic Quantities from Multi-Dimensional Sample Data". Installation: Code was written in Python 3.11.5 but should be compatible with later and earlier versions of Python down to Python 3.6. Check the requirements.txt file for any dependency issues. Usage is recommended by cloning the repository to a local directory and setting up the required environment using venv and pip: python -m venv .venv, source .venv/Scripts/activate, pip install -r requirements.txt3. Generating Data: Initially data is generated and stored in the data_evaluation/data directory using the script in the data_generation/ directory. The data for the experiments is stored as an HDF5 database. From the root directory: python data_generation/data_generation.py. Note: as the data.hdf5 file is ~123 GB, it is recommended to be locally generated. This process takes about ~12 hrs in an Intel Xeon E5-26280 v2 but shouldn't vary too much in any modern CPU. 4. Conducting an Evaluation: The scripts in the directory data_evaluation/ are used to read the data and perform the experiments. Results are stored in the results/ directory.Again, from the root directory: python data_evaluation/eval_bin_entropy.py. All of the names of the scripts have the format eval_estimator_quantity.py. In total, 12 scripts must be run, tree for each estimator: binning, KDE, numerical integration of KDE and k-NN. The notebooks/ directory serves as an archive of the development of the workflow to test each estimator. The contents of each notebook are generally the same as the code in the scripts. Log files describe the history of the project. 5. Visualizing Results: The analysis_results directory contains a notebook to create the plots used in the paper, as well as a script to read the log files and calculate the time per iteration of the different experiments.The plots are generated using the results from the data_evaluation/results directory. Results are read from .hdf5 files. All results produced using the UNITE Toolbox.

@misc{alvarezchaves2024replication, abstract = {Non-Parametric Estimation in Information Theory1. Introduction: This is a repository for our paper on: "Evaluating Density- and Nearest Neighbor-based Methods to Accurately Estimate Information-Theoretic Quantities from Multi-Dimensional Sample Data". Installation: Code was written in Python 3.11.5 but should be compatible with later and earlier versions of Python down to Python 3.6. Check the requirements.txt file for any dependency issues. Usage is recommended by cloning the repository to a local directory and setting up the required environment using venv and pip: python -m venv .venv, source .venv/Scripts/activate, pip install -r requirements.txt3. Generating Data: Initially data is generated and stored in the data_evaluation/data directory using the script in the data_generation/ directory. The data for the experiments is stored as an HDF5 database. From the root directory: python data_generation/data_generation.py. Note: as the data.hdf5 file is ~123 GB, it is recommended to be locally generated. This process takes about ~12 hrs in an Intel Xeon E5-26280 v2 but shouldn't vary too much in any modern CPU. 4. Conducting an Evaluation: The scripts in the directory data_evaluation/ are used to read the data and perform the experiments. Results are stored in the results/ directory.Again, from the root directory: python data_evaluation/eval_bin_entropy.py. All of the names of the scripts have the format eval_{estimator}_{quantity}.py. In total, 12 scripts must be run, tree for each estimator: binning, KDE, numerical integration of KDE and k-NN. The notebooks/ directory serves as an archive of the development of the workflow to test each estimator. The contents of each notebook are generally the same as the code in the scripts. Log files describe the history of the project. 5. Visualizing Results: The analysis_results directory contains a notebook to create the plots used in the paper, as well as a script to read the log files and calculate the time per iteration of the different experiments.The plots are generated using the results from the data_evaluation/results directory. Results are read from .hdf5 files. All results produced using the UNITE Toolbox. }, added-at = {2024-03-18T13:06:10.000+0100}, affiliation = {Alvarez Chaves, Manuel/Universität Stuttgart, Gupta, Hoshin/The University of Arizona, Ehret, Uwe/Karlsruhe Institute of Technology, Guthke, Anneli/Universität Stuttgart}, author = {Alvarez Chaves, Manuel and Gupta, Hoshin and Ehret, Uwe and Guthke, Anneli}, biburl = {https://puma.ub.uni-stuttgart.de/bibtex/2920559f9920d684f0535693c4f4b90b0/unibiblio}, doi = {10.18419/darus-4087}, howpublished = {Software}, interhash = {3c2348f992b83041244bf71ef7104c31}, intrahash = {920559f9920d684f0535693c4f4b90b0}, keywords = {darus ubs_10021 ubs_20019 ubs_30165 unibibliografie}, note = {Related to: Evaluating Density- and Nearest Neighbor-based Methods to Accurately Estimate Information-Theoretic Quantities from Multi-Dimensional Sample Data}, orcid-numbers = {Alvarez Chaves, Manuel/0009-0002-8990-3785, Gupta, Hoshin/0000-0001-9855-2839, Ehret, Uwe/0000-0003-3454-8755, Guthke, Anneli/0000-0003-2901-1603}, timestamp = {2024-03-18T13:06:10.000+0100}, title = {Replication Data for: Evaluating Non-Parametric Methods in Information Theory}, year = 2024 }

PUMA

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Replication Data for: Evaluating Non-Parametric Methods in Information Theory

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

PUMA

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Replication Data for: Evaluating Non-Parametric Methods in Information Theory

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Replication Data for: Evaluating Non-Parametric Methods in Information Theory

Comments and Reviews
(0)