Authors	@Anonymous
Contributors
Created	June 30, 2023
Last updated	July 12, 2023
Comment due date	July 12, 2023
Due date	September 22, 2023
Status	Approved
Objective	Surface OpenAlex concepts as Hubs in the ResearchHub UI, replacing the existing hubs.
Key outcomes	- the API allows retrieving hubs associated with documents/papers, along with confidence scores

hubs associated with a paper are visible in the UI
the backend architecture allows expanding the functionality of hubs based on the product vision (related hubs, reputation, user-defined hubs, multiple classifiers) | | Approvers | @Anonymous, @Anonymous |

Background

Hubs

Vision

Automated Hub Extraction — Ideas Considered

OpenAlex Concept Graph (detailed below).
Use OpenAI/an LLM to generate categories for a given paper based on its text. See example.
Scientific Disciplines (briefly outlined in the previous discussion regarding the revamping of hubs).
An in-house machine learning classification model associates predefined hubs with a paper based on the text of the paper.

Existing (Partial) Implementation

In an attempt to introduce a better hub association mechanism, as part of a work trial, an OpenAlex API integration has been partly implemented. The idea here is to use a reliable third party tool to (1) generate meaningful tags (i.e. high resolution scientific fields) and (2) associate these tags with papers as a first step towards the vision outlined above.

OpenAlex Concepts

Implementation notes

Limitations

Does not include OpenAlex score, which is essential to determining the relevance of a tag for a given paper.
Other OpenAlex metadata is missing, which could later be used to represent a tag graph: ancestors, related_concepts, level (used to indicate the level of resolution for the concept), etc.
Incomplete:
- concept extraction does not always succeed, causing some papers to be tagged, while others not.
- there is no backfill or reconciliation mechanism: backfilling is an operation that runs on all papers in the repository and ensures each of them has tags associated with them, and is executed immediately after releasing the feature; reconciliation **is similar to backfilling, but it runs periodically, to ensure that all papers where tagging has failed during the day are re-processed.
Limited to OpenAlex concepts, and does not allow for other sources of tag extraction, because of the design of the tag_concept table, which has multiple OpenAlex-specific columns. For example, if in a future iteration we wanted to use OpenAI/Chat GPT or Scientific Disciplines to assign tags to a paper, there is no clean way to do it without modifying this table.