Eliminating the Biggest Bottleneck in Web Usage Mining: The Preprocessing Phase

Article & Journal: An innovative data collection method to eliminate the preprocessing phase in web usage mining Published in: Engineering Science and Technology, an International Journal (JESTECH)

Study Summary: Web Usage Mining (WUM) projects typically spend up to 80% of their time cleaning messy server logs. We developed a novel server-side data collection method that captures structured, noise-free data directly at the source. This approach renders the traditional and laborious preprocessing phase obsolete. The proposed method was tested on a large-scale university portal, proving that we can obtain analysis-ready data instantly without external trackers.

Behind the Research: For years, researchers have struggled with "dirty data" in web logs. Incomplete sessions and crawler traffic ruin analysis accuracy. We asked a simple question: Why clean the data later when we can collect it cleanly in the first place? By shifting the logic to the collection layer, we ensured that every record in the database represents a real human action. This shift not only saves time but also drastically improves the reliability of any subsequent data mining task.

Citation & DOI: Canay, O., & Kocabıçak, Ü. (2023). An innovative data collection method to eliminate the preprocessing phase in web usage mining. Engineering Science and Technology, an International Journal, 40, 101360. DOI: 10.1016/j.jestch.2023.101360

Comments

Popular Posts