This tool is able to locate gene/protein names in biomedical literature. The core of the system is a dictionary generated by semi-supervised learning from a large amount of unlabeled biomedical texts [1]. Two appoaches are provided: (a) maximum match based on the dictionary. (b) The combination of the dictionary and a conditional random field (CRF) model. You can test it with your own sentences on the demo page.

Click here to download the dictionary used in the system.





[1] Yanpeng Li, Hongfei Lin and ZhihaoYang. Incorporating Rich Background Knowledge for Gene Named Entity Classification and Recognition, BMC Bioinformatics, 2009, 10:223.

This page is maintained by Yanpeng Li.

Department of Computer Science and Engineering, Dalian University of Technology, Dalian, China.