Single Document Semantic Spaces

Villalon, J. and Calvo, R. A.

    Latent Semantic Analysis (LSA) has been successfully used in a number of information retrieval, document visualization and summarization applications. LSA semantic spaces are normally created from large corpora that reflect an assumed background knowledge. However the right size and coverage of the background knowledge for each application are still open research questions. Moreover, LSA’s computational cost is directly related to the size of the corpus, making the technique inviable in many cases. This paper introduces a technique for creating semantic spaces using a single document and no background knowledge, which cuts computational cost and is domain independent. Single document semantic spaces’ reliability was evaluated on a collection of student essays. Several semantic spaces generated from large corpora and single documents were used to compare how essays are represented. The distance between consecutive sentences in the essays changes between semantic spaces, but the rank of the distances is preserved. The results show that high correlations (0.7) of ranked distances between sentences can be achieved on the different spaces for the weight schemes evaluated. This has important implications for the applications discussed.
Cite as: Villalon, J. and Calvo, R. A. (2009). Single Document Semantic Spaces. In Proc. Eighth Australasian Data Mining Conference (AusDM`09) Melbourne, Australia. CRPIT, 101. Kennedy P. J., Ong K. and Christen P. Eds., ACS. 175-182
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS