BacMedia is part of the DiASPora project, where we want to enrich biodiversity data using computational methods.The DiASPora project
In WP 4 of this project, we are going to use artificial intelligence to predict the cultivation conditions for unculturable bacteria. But in order to do this, we need to mobilize data for known media. However, these data were hidden in manually curated PDF and Word documents. Due to variable formatting, misspellings, and the lack of defined vocabulary, those data are not accessible by machines.
BacMedia is an attempt to transform those poorly structured data into a relational database. Therefore, we used a regular-expression approach to mine the PDFs and develop the database. We then built the BacMedia web interface to share this developments with the community.
Updating the Media database and adding more features.
Finishing DevMedia, an internal curator interface for the media data.
Polishing BacMedia website, improving documentation, preparing for deployment
Finished manual curation of all extracted media data
Building the first version of BacMedia
Extraction of the media data from Word documents
Start of this project