Eliminating the Biggest Bottleneck in Web Usage Mining: The Preprocessing Phase
Article & Journal: An innovative data collection method to eliminate the preprocessing phase in web usage mining Published in: Engineering Science and Technology, an International Journal (JESTECH)
Study
Summary: Web Usage Mining (WUM)
projects typically spend up to 80% of their time cleaning messy server logs. We
developed a novel server-side data collection method that captures structured,
noise-free data directly at the source. This approach renders the traditional
and laborious preprocessing phase obsolete. The proposed method was tested on a
large-scale university portal, proving that we can obtain analysis-ready data
instantly without external trackers.
Behind
the Research: For
years, researchers have struggled with "dirty data" in web logs.
Incomplete sessions and crawler traffic ruin analysis accuracy. We asked a
simple question: Why clean the data later when we can collect it cleanly in the
first place? By shifting the logic to the collection layer, we ensured that
every record in the database represents a real human action. This shift not
only saves time but also drastically improves the reliability of any subsequent
data mining task.
Citation
& DOI: Canay, O., &
Kocabıçak, Ü. (2023). An innovative data collection method to eliminate the
preprocessing phase in web usage mining. Engineering Science and Technology,
an International Journal, 40, 101360. DOI:
10.1016/j.jestch.2023.101360
Comments
Post a Comment