RESEARCH OF MACHINE LEARNING METHODS FOR SEARCH INFORMATION

T.A.  HRYHOROVA; V.P.  LYASHENKO; O.O.  MOSKALENKO

doi:10.32782/KNTU2618-0340/2021.4.2.2.7

Авторы

T.A. HRYHOROVA Kremenchug National University named after Mykhailo Ostrogradsky https://orcid.org/0000-0002-4371-8624
V.P. LYASHENKO Kremenchug National University named after Mykhailo Ostrogradsky https://orcid.org/0000-0002-4538-631X
O.O. MOSKALENKO Kremenchug National University named after Mykhailo Ostrogradsky

DOI:

https://doi.org/10.32782/KNTU2618-0340/2021.4.2.2.7

Аннотация

The advantages of using machine learning in search are that the search engine can learn and thus lead to more personalized answers, rather than the common results. In well-known search engines, such algorithms have been used for a long time and are constantly being improved. In the work on the examples were studied methods and algorithms of machine learning, which are used to search for information, their advantages and disadvantages. Collaborative filtering, clustering, and search for associative rules were chosen. The main approaches of collaborative filtering - correlation and latent models are considered. The correlation models - user similarity filtering (user-based filtration) and link similarity filtering (item-based filtration). These models are considered in the examples, which show how the algorithms work. Link similarity filtering predicts an estimate based on the estimates of another link, and uses regression analysis or, alternatively, uses a simplified predictor called the SlopeOne algorithm. The metrics Euclidean distance, cosine coefficient and Pearson correlation coefficient, which are used to determine the user similarity coefficient in the filtering model by user similarity, are considered. Clustering algorithms such as biclasterization, DBSCAN noise clustering algorithm, and fuzzy c-means fuzzy clustering algorithm are considered as latent models. All these algorithms are designed to form data clusters according to a certain criterion. The search for associative rules is considered on the example of the Apriori algorithm, which is generated on the basis of all frequent search sets found in the database of search queries that meet the specified match criterion. To apply this algorithm, the data were reduced to a binary form and the corresponding data structure. It is concluded that each of these methods has its drawbacks and only by combining them can achieve the desired result to improve the quality of the search depending on the tasks set by the customer.

Перевагами використання машинного навчання в пошуку є те, що пошукова система може навчатися і тим самим приводити до більш персоналізованих відповідей, а не поширених результатів. У відомих пошукових системах такі алгоритми використовуються вже давно і постійно удосконалюються. У роботі на прикладах було досліджено методи і алгоритми машинного навчання, які використовуються для пошуку інформації, їх переваги і недоліки. Було обрано колоборативну фільтрацію, кластерізацію та пошук асоціативних правил. Розглянуті основні підходи колоборативної фільтрації – кореляційні і латентні моделі. В якості кореляційних моделей – фільтрацію за подібністю користувачів (user-based filtration) і фільтрація за подібністю посилань (item-based filtration). Ці моделі розглянуті на прикладах, які показують, як працюють алгоритми. Фільтрація за подібністю посилань прогнозує оцінку на основі оцінок іншого посилання та використовує регресійний аналіз або, як альтернативу, використовує спрощений предиктор, що називається алгоритмом SlopeOne. Розглянуті метрики: евклідова відстань, косинусний коефіцієнт та коефіцієнт кореляції Пірсона, що використовуються для визначення коефіцієнта подібності користувачів в моделі фільтрації за подібністю користувачів. В якості латентних моделей розглянуті алгоритми кластеризації: бікластеризація, алгоритм просторової кластеризації з присутністю шуму DBSCAN, алгоритм нечіткої кластеризації c-means. Всі ці алгоритми призначені для формування кластерів даних за визначеним критерієм. Розглянуто пошук асоціативних правил на прикладі алгоритму Apriori, що генеруються на основі всіх поширених пошукових наборів, виявлених в базі даних пошукових запитів, які задовольняють заданому критерію відповідності. Для застосування цього алгоритму дані були приведені до бінарного вигляду та відповідної структури даних. Зроблено висновки, що кожний з цих методів має свої недоліки і тільки завдяки їх комбінуванню можна досягти бажаного результату для підвищення якості пошуку в залежності від задач, які поставив замовник.

Библиографические ссылки

Sherbakov, D. Kak iskusstvenny intellekt povliyal na poiskovye sistemy. Retrieved from https://www.uplab.ru/blog/artificial-intelligence/

Segaran T., Programming Collective Intelligence (O’Reilly Media Inc., California, 2007), pp. 27–46.

Yao Z., Weibin C., “Review of research on collaborative filtering recommendation”, Micro Machines and Applications 6, 2013, pp. 4-10.

Owen S., Anil R., Dunning T. and Friedman E., Mahout in Action (Manning Publications Co, Shelter Island, 2012), pp. 48–56.

Pu Wang and HongWu Ye, “A Personalized Recommendation Algorithm Combining Slope One Scheme and User Based Collaborative Filtering”, IIS '09, 2009, pp. 152-154.

Bo F. and Jiujun C. “Collaborative filtering and recommendation algorithm based on multiple similarities among users”, Computer Science, No.39, 2012, pp. 23-26.

Hofmann T. and Puzicha J., “Latent class models for collaborative filtering”, in Proceedings of the International Joint Conference on Artificial Intelligence, 1999, pp. 668–693.

Madeira S. C. and Oliveira A. L., "Biclustering Algorithms for Biological Data Analysis: A Survey", IEEE/ACM Transactions on Computational Biology and Bioinformatics, VOL 1, NO. 1, pp. 24-45 January-March 2004.

Bhavithra, J. and Saradha, A. Personalized Web Page Recommendation Using Case-Based Clustering and Weighted Association Rule Mining. Cluster Computing, 2019, 22, 6991-7002

RESEARCH OF MACHINE LEARNING METHODS FOR SEARCH INFORMATION

Авторы

DOI:

Аннотация

Библиографические ссылки

Загрузки

Опубликован

Как цитировать

Выпуск

Раздел

Язык

Информация