Skip to content

THUDM/paper-source-trace

Repository files navigation

paper-source-trace

Prerequisites

  • Linux
  • Python 3.9
  • PyTorch 1.10.0+cu111

Getting Started

Installation

Clone this repo.

git clone https://github.com/THUDM/paper-source-trace.git
cd paper-source-trace

Please install dependencies by

pip install -r requirements.txt

PST Dataset

The dataset can be downloaded from BaiduPan with password bft3, Aliyun or DropBox. The paper XML files are generated by Grobid APIs from paper pdfs.

Run Baselines for KDD Cup 2024

First, download DBLP dataset from AMiner. Put the unzipped PST directory into data/ and unzipped DBLP dataset into data/PST/.

cd $project_path
export CUDA_VISIBLE_DEVICES='?'  # specify which GPU(s) to be used
export PYTHONPATH="`pwd`:$PYTHONPATH"

# Method 1: Random Forest
python rf/process_kddcup_data.py
python rf/model_rf.py  # output at out/kddcup/rf/

# Method 2: Network Embedding
python net_emb.py  # output at out/kddcup/prone/

# Method 3: SciBERT
python bert.py  # output at out/kddcup/scibert/

Results on Valiation Set

Method MAP
Random Forest 0.21420
ProNE 0.21668
SciBERT 0.29489

Citation

If you find this repo useful in your research, please cite the following papers:

@article{zhang2024pst,
  title={PST-Bench: Tracing and Benchmarking the Source of Publications},
  author={Fanjin Zhang and Kun Cao and Yukuo Cen and Jifan Yu and Da Yin and Jie Tang},
  journal={arXiv preprint arXiv:2402.16009},
  year={2024}
}

@article{zhang2024oag,
    title={OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining},
    author={Fanjin Zhang and Shijie Shi and Yifan Zhu and Bo Chen and Yukuo Cen and Jifan Yu and Yelin Chen and Lulu Wang and Qingfei Zhao and Yuqing Cheng and Tianyi Han and Yuwei An and Dan Zhang and Weng Lam Tam and Kun Cao and Yunhe Pang and Xinyu Guan and Huihui Yuan and Jian Song and Xiaoyan Li and Yuxiao Dong and Jie Tang},
    journal={arXiv preprint arXiv:2402.15810},
    year={2024}
}

Paper Sharing Group

Hello everyone,

We've created an online WeChat paper-sharing group where each member is required to share 2 computer science papers every week. We have established mechanisms of rewards and penalties for members who do and do not share papers as required. You are free to join or leave at any time. Welcome to join us!

描述文字

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published