|
Program Project Database and Data Integration
Dr. Jeanette Papp
The very large volumes of data that will be generated and analyzed in
the Transplant Genomics Project pose a singular challenge in data
management and integration. Both gene expression microarrays and
proteomics experiments generate very large raw data files, and even
when transformed yield large amounts of data to manage. The SNP project
is anticipated to generate several hundred million SNP genotypes.
Relevant issues to be addressed include collection, cleaning,
management, integration, security, archiving, presentation, and
dissemination of data.
Working with scientists in the UCLA Department of Human Genetics, Dr.
Papp has created an integrated genetic database system – IGDB – which
stores and analyzes many types of genetic data. This system will be
extended to accommodate the data generated by the Transplant Genomics
Collaborative Group. This centralized data management system will
streamline data delivery, integration, and storage. PIs and data
analysts on the Projects will have immediate access to data through the
IGDB system.
Disseminating data and information from internal research to the
scientific community is a central mission of the Transplant Genomics
Collaborative Group. Non-confidential data generated by the TGCG, along
with research project metadata, will be made available to the
scientific community in a straightforward and timely manner through a
Web-based front-end. There will be information posted on the same site
regarding experimental protocols, methodology, and definitions of
variables. In addition, relevant public data from outside sources will
be brought into the database and made available in conjunction with the
locally generated data.
Critical patient and clinical data will also be centrally available to
TGCG members. However, all confidential data of this type will be
stored in a separate, highly-secure database, not accessible from the
public internet. In order to maintain the integrity of all stored data,
transaction logs are backed up twice daily, and a complete backup of
the database is made once daily, on both disk and tape-based backup
media. Backup tapes are stored in a fire and impact resistant
combination safe, and copies are also stored off-site.
|