Tracking the Changes of Dynamic Web Pages in the Existence of URL Rewriting

Yeh, P.-J., Li, J.-T. and Yuan, S.-M.

    Crawlers in a knowledge management system need to collect and archive documents from websites, and also track the change status of these documents. However, the existence of URL rewriting mechanism raises a page tracking problem since the URLs of a pair of dynamic page instances obtained during different sessions will no longer be the same. This paper proposes a series of algorithms in a bottom-up manner to find the corresponding pairs of dynamic page instances, and then to judge the change status of them. Experiments showed that the performance was very good and the outcome was 100% accurate.
Cite as: Yeh, P.-J., Li, J.-T. and Yuan, S.-M. (2006). Tracking the Changes of Dynamic Web Pages in the Existence of URL Rewriting. In Proc. Fifth Australasian Data Mining Conference (AusDM2006), Sydney, Australia. CRPIT, 61. Peter, C., Kennedy, P. J., Li, J., Simoff, S. J. and Williams, G. J., Eds. ACS. 169-176.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS