This work aims at defining, modelling and evaluating the integrated use of collaborative software and machine learning for building high quality knowledge resources. A possible scenario is Molecular Biology, where high-throughput data production is overwhelming the traditional centralised data annotation by paid experts. Many biological resources have moved to collaborative software platforms, predominantly wikis, in an effort to involve the wider community and replicate the success story of Wikipedia. However, it has been shown that even a widespread effort will find it diffcult, if not impossible, to scale the annotation up to the current rate of data growth in Biology [1].