A huge diversity of biological databases is available via the Internet, but many of these databases have been developed in an ad hoc manner rather than in accordance with any data management principles. In addition, in the area of disordered protein databases, many of the databases have not been made publicly available. This poses challenges to researchers, since reliable protein databases are required in order to test and measure the accuracy of protein structure pre- diction software. In this paper, we describe our work developing a disordered protein database using data from the protein secondary structure database DSSP- cont. In particular, we discuss the way in which we have addressed the issues of data cleaning, query pro- cessing and interoperability. This research is a pilot study in managing biological data.
|Cite as: Stewart, A.D. and Zhang, X. (2007). Building a Disordered Protein Database: A Case Study in Managing Biological Data. In Proc. Eighteenth Australasian Database Conference (ADC 2007), Ballarat, Australia. CRPIT, 63. Bailey, J. and Fekete, A., Eds. ACS. 151-159. |
(local if available)