Authors |
@Anonymous |
Contributors |
|
Created |
June 30, 2023 |
Last updated |
July 12, 2023 |
Comment due date |
July 12, 2023 |
Due date |
September 22, 2023 |
Status |
Approved |
Objective |
Surface OpenAlex concepts as Hubs in the ResearchHub UI, replacing the existing hubs. |
Key outcomes |
- the API allows retrieving hubs associated with documents/papers, along with confidence scores |
- hubs associated with a paper are visible in the UI
- the backend architecture allows expanding the functionality of hubs based on the product vision (related hubs, reputation, user-defined hubs, multiple classifiers) |
| Approvers | @Anonymous, @Anonymous |
Background
See also the Q4 ‘22 prior discussion: https://docs.google.com/document/d/1UhVrgjLwwx73hlivOJAu3UDMv3P-EQtF2P4lpb4pfrM/edit?usp=sharing.
Hubs
Vision
Automated Hub Extraction — Ideas Considered
- OpenAlex Concept Graph (detailed below).
- Use OpenAI/an LLM to generate categories for a given paper based on its text. See example.
- Scientific Disciplines (briefly outlined in the previous discussion regarding the revamping of hubs).
- An in-house machine learning classification model associates predefined hubs with a paper based on the text of the paper.
Existing (Partial) Implementation
In an attempt to introduce a better hub association mechanism, as part of a work trial, an OpenAlex API integration has been partly implemented. The idea here is to use a reliable third party tool to (1) generate meaningful tags (i.e. high resolution scientific fields) and (2) associate these tags with papers as a first step towards the vision outlined above.
OpenAlex Concepts
Implementation notes
Limitations
- Does not include OpenAlex score, which is essential to determining the relevance of a tag for a given paper.
- Other OpenAlex metadata is missing, which could later be used to represent a tag graph:
ancestors
, related_concepts
, level
(used to indicate the level of resolution for the concept), etc.
- Incomplete:
- concept extraction does not always succeed, causing some papers to be tagged, while others not.
- there is no backfill or reconciliation mechanism: backfilling is an operation that runs on all papers in the repository and ensures each of them has tags associated with them, and is executed immediately after releasing the feature; reconciliation **is similar to backfilling, but it runs periodically, to ensure that all papers where tagging has failed during the day are re-processed.
- Limited to OpenAlex concepts, and does not allow for other sources of tag extraction, because of the design of the
tag_concept
table, which has multiple OpenAlex-specific columns. For example, if in a future iteration we wanted to use OpenAI/Chat GPT or Scientific Disciplines to assign tags to a paper, there is no clean way to do it without modifying this table.