He automatic extraction of DNA methylation events from literature through annotation following the BioNLP’09 shared task event representation and the use of a retrainable event extraction system. We created a corpus of 200 publication abstracts selected to include a representative sample of DNA methylation statements from all of PubMed and manually annotated for nearly 3000 mentionsOhta et al. Journal of Biomedical Semantics 2011, 2(Suppl 5):S2 http://www.jbiomedsem.com/content/2/S5/SPage 13 ofof genes and gene products, 500 DNA domain or region mentions, and 1500 DNA methylation and demethylation events. Evaluation using the EventMine system showed that DNA methylation events can be 6-Methoxybaicalein biological activity extracted at 78 precision and 76 recall by retraining a previously introduced event extraction system with this corpus. The learning curve suggested that the corpus size is sufficient and that future efforts in DNA methylation event extraction should focus on extraction method development. One natural direction for future work is to apply event extraction systems trained on the newly introduced data to abstracts available in PubMed and full texts available at PMC to create a detailed, up-to-date repository of DNA methylation events at full PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/28549975 literature scale. Such an effort would require gene name normalization and event extraction at PubMed scale. While substantial challenges remain for accurate normalization and event extraction at this scale, both have recently been shown to be technically feasible using methods competitive with the state of the art [14,48]. Further combining the extracted events with cancer mention detection could provide a valuable resource for epigenetics research. The newly annotated corpus, the first resource annotated for DNA methylation using the BioNLP shared task event representation, is freely available for use in research from the GENIA project homepage [49]. DNA methylation event extraction following the model developed in this study is included as part of the Epigenetics and Post-translational Modification task of the BioNLP Shared Task 2011 [17,50].Acknowledgments This study is an extension of research first presented at SMBM 2010, Hinxton, Cambridge, U.K. We would like to thank Mat?Ongenaert and other creators of PubMeth for their generosity in allowing the release of resources building on their work and the anonymous reviewers for their many insightful comments. This work was supported by Grant-inAid for Specially Promoted Research (MEXT, Japan). This article has been published as part of Journal of Biomedical Semantics Volume 2 Supplement 5, 2011: Proceedings of the Fourth International Symposium on Semantic Mining in Biomedicine (SMBM). The full contents of the supplement are available online at http://www.jbiomedsem.com/supplements/2/S5.Author details Department of Computer Science, University of Tokyo, Tokyo, Japan. 2School of Computer Science, University of Manchester, Manchester, UK. 3National Centre for Text Mining, University of Manchester, Manchester, UK. Authors’ contributions TO and SP conceived of and designed the study and drafted the manuscript. TO coordinated the annotation effort. MM performed the event extraction experiments and drafted their description. JT participated in the study design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Published: 6 OctoberReferences 1.