Ce projet a été sélectionné lors de l'appel à projets "Génomique microbienne" de l' .

Il est financé pour une période de 3 ans (2009-2011).

*The application of whole-genome sequencing to microbial communities represents a major development in metagenomics. The potential outputs of the high-throughput sequencing technologies applied to metagenomics studies are highly promising. However the costs of such metagenomics studies are high, so that the research organizations must question about the feasibility of the study, the probability of attaining the aims and have to optimize the allocation of the founds. The design of such experiments is therefore a cornerstone for building rational metagenomics projects. The success depends on the existence of appropriate data base support and management tools, the feasibility and the speed of the pattern-matching step, the probability of catching the presence of relatively rare species and the statistical power of the experiment to detect differences between conditions. Moreover, the statistical analysis is more difficult due to the large p, small n paradigm: most of the statistical methods have been developed for data with a number of individuals or replicates, n, much larger than the number of variables, p. The high-throughput technologies, such as metagenomics studies, produce data with p (the number of genes or the number of genomic fragments) much larger than n (the number of samples), so that the usual statistical methodologies do not apply. All of these critical steps depend on the number of samples, which is a key point in the design and for the cost of the experiment. The rationale of CBME is to provide some basic elements and tools to design appropriate metagenomics experiments and to produce new computational biology and statistical methods to analyse the data produced by metagenomics studies. This project is general and not devoted to a specific metagenomics experiment. However it will benefit of the participation of members of the project to specific ongoing metagenomics studies in different ecosystems.*

- UMR 518 "Mathématiques et Informatique Appliquées", AgroParisTech/INRA, Paris.
- UR1077 "Mathématique, Informatique et Génome", INRA, Jouy-en-Josas.
- UR341 "Mathématiques et Informatique Appliquées", INRA, Jouy-en-Josas.

Ces 3 laboratoires sont impliqués dans le groupe "Statistics for Systems Biology".