Loading ...

About BacMedia

BacMedia is part of the DiASPora project, where we want to enrich biodiversity data using computational methods.

The DiASPora project

In WP 4 of this project, we are going to use artificial intelligence to predict the cultivation conditions for unculturable bacteria. But in order to do this, we need to mobilize data for known media. However, these data were hidden in manually curated PDF and Word documents. Due to variable formatting, misspellings, and the lack of defined vocabulary, those data are not accessible by machines.

BacMedia is an attempt to transform those poorly structured data into a relational database. Therefore, we used a regular-expression approach to mine the PDFs and develop the database. We then built the BacMedia web interface to share this developments with the community.


  • September 2021

    Updating the Media database and adding more features.

  • July 2021

    Finishing DevMedia, an internal curator interface for the media data.

  • March 2021

    Polishing BacMedia website, improving documentation, preparing for deployment

  • February 2021

    Finished manual curation of all extracted media data

  • January 2021

    Building the first version of BacMedia

  • November 2020

    Extraction of the media data from Word documents

  • October 2020

    Start of this project