Selectivity Estimation by Batch-Query based Histogram and Parametric Method

Luo, J., Zhou, X., Zhang, Y., Shen, H.T. and Li, J.

    Histograms are used extensively for selectivity estimation and approximate query processing. Workloadaware dynamic histograms can self-tune itself based on query feedback without scanning or sampling the underlaying datasets in a systematic and comprehensive way. Dynamic histograms allocate more buckets not only for the areas with most skewed data distribution but also according to users' interest. However,it takes long time to 'warm-up' (i.e., a large number of queries need to be processed before the histogram can provide a satisfactory coverage and accuracy). Thus, it is less effective to adapt with workload pattern changes. In this paper, we propose a novel online query scheduling algorithm which can significantly reduce the warm-up time for dynamic histograms. A parametric method is proposed to remedy the problem of inaccurate query selectivity estimation for the areas with poor histogram coverage. Experimental results demonstrate a significant effectiveness and accuracy improvement of our approach.
Cite as: Luo, J., Zhou, X., Zhang, Y., Shen, H.T. and Li, J. (2007). Selectivity Estimation by Batch-Query based Histogram and Parametric Method. In Proc. Eighteenth Australasian Database Conference (ADC 2007), Ballarat, Australia. CRPIT, 63. Bailey, J. and Fekete, A., Eds. ACS. 93-102.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS