# bib_lookup **Repository Path**: deep-psp/bib_lookup ## Basic Information - **Project Name**: bib_lookup - **Description**: No description available - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-04-12 - **Last Updated**: 2026-03-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # bib_lookup [![pytest](https://github.com/DeepPSP/bib_lookup/actions/workflows/run-pytest.yml/badge.svg)](https://github.com/DeepPSP/bib_lookup/actions/workflows/run-pytest.yml) [![codecov](https://codecov.io/github/DeepPSP/bib_lookup/branch/master/graph/badge.svg?token=H1B26Q3XWX)](https://codecov.io/github/DeepPSP/bib_lookup) [![PyPI](https://img.shields.io/pypi/v/bib_lookup?style=flat-square)](https://pypi.org/project/bib-lookup/) [![DOI](https://zenodo.org/badge/476130336.svg)](https://zenodo.org/badge/latestdoi/476130336) [![downloads](https://img.shields.io/pypi/dm/bib-lookup?style=flat-square)](https://pypistats.org/packages/bib-lookup) [![license](https://img.shields.io/github/license/DeepPSP/bib_lookup?style=flat-square)](LICENSE) ![GitHub Release Date - Published_At](https://img.shields.io/github/release-date/DeepPSP/bib_lookup) ![GitHub commits since latest release (by SemVer including pre-releases)](https://img.shields.io/github/commits-since/DeepPSP/bib_lookup/latest) [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://bib-lookup.streamlit.app/) A useful tool for looking up Bib entries using DOI, PubMed ID (URL), or arXiv ID (URL). :rocket: **NEW** :rocket: **Streamlit** support! See [here](https://bib-lookup.streamlit.app/) for an app deployed on [Streamlit Community Cloud](https://share.streamlit.io/). It is an updated version of **NOTE** that you should have internet connection to use `bib_lookup`. - [Installation](#installation) - [Dependencies](#dependencies) - [Basic Usage Examples](#basic-usage-examples) - [Command-line Usage](#command-line-usage) - [Output (Append) to a `.bib` File](#append-to-file) - [arXiv to DOI](#arxiv-to-doi) - [Bib Items Checking](#bib-items-checking) - [Simplify a `.bib` File](#simplify-file) - [`CitationMixin` class](#citation-mixin) - [TODO](#todo) - [WARNING](#warning) - [Biblatex Cheetsheet](#biblatex-cheetsheet) - [Citation](#citation) - [References](#references) ## Installation Run ```bash python -m pip install bib-lookup ``` or install the latest version in [GitHub](https://github.com/DeepPSP/bib_lookup/) using ```bash python -m pip install git+https://github.com/DeepPSP/bib_lookup.git ``` or git clone this repository and install locally via ```bash cd bib_lookup python -m pip install . ``` :point_right: [Back to TOC](#bib_lookup) ## Dependencies - requests - feedparser - pandas :point_right: [Back to TOC](#bib_lookup) ## Basic Usage Examples
Click to expand! ```python >>> from bib_lookup import BibLookup >>> bl = BibLookup(align="middle") >>> print(bl("1707.07183")) @article{wen2017_1707.07183v2, author = {Hao Wen and Chunhui Liu}, title = {Counting Multiplicities in a Hypersurface over a Number Field}, journal = {arXiv preprint arXiv:1707.07183v2}, year = {2017}, month = {7} } >>> print(bl("10.1109/CVPR.2016.90")) @inproceedings{He_2016, author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun}, title = {Deep Residual Learning for Image Recognition}, booktitle = {2016 {IEEE} Conference on Computer Vision and Pattern Recognition ({CVPR})}, doi = {10.1109/cvpr.2016.90}, year = {2016}, month = {6}, publisher = {{IEEE}} } >>> print(bl("10.23919/cinc53138.2021.9662801", align="left-middle")) @inproceedings{Wen_2021, author = {Hao Wen and Jingsu Kang}, title = {Hybrid Arrhythmia Detection on Varying-Dimensional Electrocardiography: Combining Deep Neural Networks and Clinical Rules}, booktitle = {2021 Computing in Cardiology ({CinC})}, doi = {10.23919/cinc53138.2021.9662801}, publisher = {{IEEE}}, year = {2021}, month = {9}, pages = {1–4} } ``` :point_right: [Back to TOC](#bib_lookup)
## Command-line Usage
Click to expand! After installation, one can use `bib-lookup` in the command line: ```bash bib-lookup 10.1109/CVPR.2016.90 10.23919/cinc53138.2021.9662801 --ignore-fields url doi -i path/to/input.txt -o path/to/output.bib ``` View current version: ```bash bib-lookup --version ``` View current configuration: ```bash bib-lookup --config show ``` Remove current configuration: ```bash bib-lookup --config reset ``` Set specific configuration: ```bash bib-lookup --config "timeout=2.0;print_result=true;ignore_fields=['url','pdf']" ``` or from a `json` file or `yaml` file: ```bash bib-lookup --config /path/to/config.json bib-lookup --config /path/to/config.yaml ``` Note that unrecognized fields will be ignored and warning messages will be printed. The following table lists all the available configuration options: | Option | Type | Default | Description | |-----------------|---------|-----------------------------------------------|-----------------------------------------------------| | `align` | `str` | `middle` | Alignment of the bib item. | | `email` | `str` | `None` | Email address to be used in the request. | | `ignore_fields` | `list` | `['url', 'pdf']` | Fields to be ignored in the output. | | `ignore_errors` | `bool` | `False` | Whether to ignore errors. | | `timeout` | `float` | `6.0` | Timeout in seconds for each request. | | `arxiv2doi` | `bool` | `True` | Whether to convert arXiv ID to DOI. | | `format` | `str` | `bibtex` | Output format. | | `style` | `str` | `apa` | Citation style. Valid only when `format` is `text`. | | `verbose` | `int` | `0` | Verbosity level. | | `print_result` | `bool` | `False` | Whether to print the result. | | `ordering` | `list` | `['title', 'author', 'journal', 'booktitle']` | Ordering of the fields. | :point_right: [Back to TOC](#bib_lookup)
## Output (Append) to a `.bib` File
Click to expand! Each time a bib item is successfully found, it will be cached. One can call the `save` function to write the cached bib items to a `.bib` file, in the append mode. ```python >>> from bib_lookup import BibLookup >>> bl = BibLookup() >>> bl(["10.1109/CVPR.2016.90", "10.23919/cinc53138.2021.9662801", "DOI: 10.1142/S1005386718000305"]); >>> len(bl) 3 >>> bl[0] '10.1109/CVPR.2016.90' >>> bl.save([0, 2], "path/to/some/file.bib") # save bib item corr. to "10.1109/CVPR.2016.90" and "DOI: 10.1142/S1005386718000305" >>> len(bl) 1 >>> bl.pop(0) # remove the bib item corr. "10.23919/cinc53138.2021.9662801", equivalent to `bl.pop("10.23919/cinc53138.2021.9662801")` >>> len(bl) 0 ``` :point_right: [Back to TOC](#bib_lookup)
## arXiv to DOI
Click to expand! From 2022.2.17, new arXiv articles are automatically assigned DOIs (old ones in progress). If one prefers DOI citation to arXiv citation then ```python >>> from bib_lookup import BibLookup >>> bl = BibLookup(arxiv2doi=True) # the default for `arxiv2doi` is False >>> print(bl("https://arxiv.org/abs/2204.04420")) @misc{https://doi.org/10.48550/arxiv.2204.04420, author = {Hao, Wen and Jingsu, Kang}, title = {Investigating Deep Learning Benchmarks for Electrocardiography Signal Processing}, doi = {10.48550/ARXIV.2204.04420}, keywords = {Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution 4.0 International} } ``` while with `bl = BibLookup()`, one would get ```latex @article{hao2022_2204.04420v1, author = {Wen Hao and Kang Jingsu}, title = {Investigating Deep Learning Benchmarks for Electrocardiography Signal Processing}, journal = {arXiv preprint arXiv:2204.04420v1}, year = {2022}, month = {4} } ``` :point_right: [Back to TOC](#bib_lookup)
## Bib Items Checking
Click to expand! One can use `BibLookup` to check the validity (**required fields, duplicate labels**, etc) of bib items in a Bib file. The following is an example with a [Bib file](/test/invalid_items.bib) with incorrect and duplicate bib items. ```python >>> from bib_lookup import BibLookup >>> bl = BibLookup() >>> bl.check_bib_file("./test/invalid_items.bib") Bib item "He_2016" starting from line 3 is not valid. Bib item of entry type "inproceedings" should have the following fields: ['author', 'title', 'booktitle', 'year'] Bib item "Wen_2018" starting from line 16 is not valid. Bib item of entry type "article" should have the following fields: ['author', 'title', 'journal', 'year'] Bib items "He_2016" starting from line 3 and "He_2016" starting from line 45 is duplicate. [3, 16, 45] ``` or from command line ```bash bib-lookup -c ./test/invalid_items.bib bib-lookup --ignore-fields url doi -i ./test/sample_input.txt -o ./tmp/a.bib -c true ``` :point_right: [Back to TOC](#bib_lookup)
## Simplify a `.bib` File
Click to expand! Sometimes one wants a clean bib without bib items that are not cited, then one can use the static method `simplify_bib_file` to generate a new `.bib` File that contains only the cited bib items from an old `.bib` File. ```python >>> from bib_lookup import BibLookup >>> new_bib_file_path = BibLookup.simplify_bib_file("path/to/tex/source/file", "path/to/old/bib/file") >>> # or use the following if one has multiple source files >>> new_bib_file_path = BibLookup.simplify_bib_file(list_of_tex_source_files_or_folders, "path/to/old/bib/file") ``` :point_right: [Back to TOC](#bib_lookup)
## `CitationMixin` class
Click to expand! One can inherit the `CitationMixin` class to have the method `get_citation` for any class, in which case one only needs to provide a `self.doi`. For example: ```python from bib_lookup import CitationMixin class SomeClass(CitationMixin): doi = "10.23919/cinc53138.2021.9662801" # can also be a list ```
## TODO
Click to expand! 1. [:heavy_check_mark:](#command-line-usage) ~~add CLI support~~; 2. :x: ~~use eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi for PubMed, as in \[[3](#ref3)\]~~; 3. :x: ~~try using google scholar api described in \[[4](#ref4)\] (unfortunately \[[4](#ref4)\] is charged)~~; 4. [:heavy_check_mark:](https://bib-lookup.streamlit.app/) ~~use `Flask` to write a simple browser-based UI~~; 5. :heavy_check_mark: ~~check if the bib item is already existed in the output file, and skip saving it if so~~; 6. :heavy_check_mark: ~~since arXiv articles are now automatically assigned DOIs (ref. [this blog](https://blog.arxiv.org/2022/02/17/new-arxiv-articles-are-now-automatically-assigned-dois/)), consider converting arXiv identifiers to DOI indentifiers, and requesting from DOI. Currently, the request results are different, at least the entry type is change from `article` to `misc`~~; 7. make `__call__` method asynchronised using `asyncio` and `aiohttp` or `httpx`. :point_right: [Back to TOC](#bib_lookup)
## WARNING
Click to expand! Many journals have specific requirements for the Bib entries, for example, the title and/or journal (and/or booktitle), etc. should be **capitalized**, which could not be done automatically since - some abbreviations in title should have characters all in the upper case, for example > ALBERT: A Lite BERT for Self-supervised Learning of Language Representations - some should have characters all in in the lower case, > mixup: Beyond Empirical Risk Minimization - and some others should have mixed cases, > KeMRE: Knowledge-enhanced Medical Relation Extraction for Chinese Medicine Instructions This should be corrected by the user himself **if necessary** (which although is rare), and remember to enclose such fields with **double curly braces**. For example, the lookup result for the `AlexNet` paper is ```python >>> from bib_lookup import BibLookup >>> bl = BibLookup() >>> print(bl("https://doi.org/10.1145/3065386")) @article{Krizhevsky_2017, author = {Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton}, title = {{ImageNet} classification with deep convolutional neural networks}, journal = {Communications of the {ACM}}, doi = {10.1145/3065386}, year = {2017}, month = {5}, publisher = {Association for Computing Machinery ({ACM})}, volume = {60}, number = {6}, pages = {84--90} } ``` This result (the title) should be adjusted to ```latex @article{Krizhevsky_2017, author = {Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton}, title = {{ImageNet Classification with Deep Convolutional Neural Networks}}, journal = {Communications of the {ACM}}, doi = {10.1145/3065386}, year = {2017}, month = {5}, publisher = {Association for Computing Machinery ({ACM})}, volume = {60}, number = {6}, pages = {84--90} } ``` A more severe example that need manual correction is as follows ```python >>> from bib_lookup import BibLookup >>> bl = BibLookup() >>> print(bl("10.1093/acprof:oso/9780195058239.001.0001")) @book{Malmivuo_1995, author = {Jaakko Malmivuo and Robert Plonsey}, title = {{BioelectromagnetismPrinciples} and Applications of Bioelectric and Biomagnetic Fields}, doi = {10.1093/acprof:oso/9780195058239.001.0001}, year = {1995}, month = {10}, publisher = {Oxford University Press} } ``` Adjust it to ```latex @book{Malmivuo_1995, author = {Jaakko Malmivuo and Robert Plonsey}, title = {{Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields}}, doi = {10.1093/acprof:oso/9780195058239.001.0001}, year = {1995}, month = {10}, publisher = {Oxford University Press} } ``` This shows that the data in the DOI database is **NOT** always correct. :point_right: [Back to TOC](#bib_lookup)
## Biblatex Cheetsheet [This file](/biblatex-cheatsheet.pdf) downloaded from \[[6](#ref6)\] gives full knowledge about `bib` entries. :point_right: [Back to TOC](#bib_lookup) ## Citation ```latex @misc{https://doi.org/10.5281/zenodo.6435017, author = {WEN, Hao}, title = {bib\_lookup: A Useful Tool for Uooking Up Bib Entries}, doi = {10.5281/ZENODO.6435017}, url = {https://zenodo.org/record/6435017}, publisher = {Zenodo}, year = {2022}, copyright = {MIT License} } ``` The above citation can be get via ```python >>> from bib_lookup import BibLookup >>> bl = BibLookup() >>> print(bl("DOI: 10.5281/zenodo.6435017")) ``` :point_right: [Back to TOC](#bib_lookup) ## References 1. https://github.com/davidagraf/doi2bib2 2. https://arxiv.org/help/api 3. https://github.com/mfcovington/pubmed-lookup/ 4. https://serpapi.com/google-scholar-cite-api 5. https://www.bibtex.com/ 6. http://tug.ctan.org/info/biblatex-cheatsheet/biblatex-cheatsheet.pdf :point_right: [Back to TOC](#bib_lookup)