Nov 07, 2009 10:10:19 PST
Program Project Database and Data Integration

Dr. Jeanette Papp

The very large volumes of data that will be generated and analyzed in the Transplant Genomics Project pose a singular challenge in data management and integration. Both gene expression microarrays and proteomics experiments generate very large raw data files, and even when transformed yield large amounts of data to manage. The SNP project is anticipated to generate several hundred million SNP genotypes. Relevant issues to be addressed include collection, cleaning, management, integration, security, archiving, presentation, and dissemination of data.

Working with scientists in the UCLA Department of Human Genetics, Dr. Papp has created an integrated genetic database system – IGDB – which stores and analyzes many types of genetic data. This system will be extended to accommodate the data generated by the Transplant Genomics Collaborative Group. This centralized data management system will streamline data delivery, integration, and storage. PIs and data analysts on the Projects will have immediate access to data through the IGDB system.

Disseminating data and information from internal research to the scientific community is a central mission of the Transplant Genomics Collaborative Group. Non-confidential data generated by the TGCG, along with research project metadata, will be made available to the scientific community in a straightforward and timely manner through a Web-based front-end. There will be information posted on the same site regarding experimental protocols, methodology, and definitions of variables. In addition, relevant public data from outside sources will be brought into the database and made available in conjunction with the locally generated data.

Critical patient and clinical data will also be centrally available to TGCG members. However, all confidential data of this type will be stored in a separate, highly-secure database, not accessible from the public internet. In order to maintain the integrity of all stored data, transaction logs are backed up twice daily, and a complete backup of the database is made once daily, on both disk and tape-based backup media. Backup tapes are stored in a fire and impact resistant combination safe, and copies are also stored off-site.