Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








804 Hits in 5.6 sec

Web Page Classification* [chapter]

B. Choi, Z. Yao
2005 Studies in Fuzziness and Soft Computing  
The top k training documents are k-nearest neighbors of the new document and the k-nearest neighbors are used to predict the categories of the new document.  ...  Used for feature selection, the odds ratio of feature f and category c i captures the difference between the distribution of feature f on its positive class c i and the distribution of feature f on its  ... 
doi:10.1007/11362197_9 fatcat:4nouetuxezb6ri2thtfcbnqadq

Cross-collection Dataset of Public Domain Portuguese-language Works

Mariana O. Silva, Clarisse Scofield, Luiza De Melo-Gomes, Mirella M. Moro
2022 Journal of Information and Data Management  
Afterintroducing its building process and content, we present an exploratory data analysis with a quantitative description of its main features.  ...  Many datasets are published in English to get more engagement, popularity and reach within a research community. Indeed, most sciences are language-agnostic and thrive on publicly available data.  ...  The library uses Levenshtein Distance to calculate the differences between two strings. With a partial ratio set at 75%, the fuzzy string matching process generates an incomplete result.  ... 
doi:10.5753/jidm.2022.2349 fatcat:hbfbv5c46feptgkqq4tufaicb4

Machine Learning for Web Page Classification: A Survey

safae lassri, EL HABIB BENLAHMAR, Abderrahim TRAGHA
2019 International Journal of Information Science and Technology  
from ScienceDirect and Springer websites, we review the different machine learning algorithms used to categorize web pages.  ...  Web page classification has many applications, among them the construction of web directories and the building of focused crawlers.  ...  This is necessary to determine which samples are the nearest neighbors. Distance measures such as Euclidean distance are commonly used.  ... 
doaj:483a4b9f259046a29c57adc3021a50d0 fatcat:hdznsdeotnhwpgpuigi7iovhja

STOP WORD DETECTION AS A BINARY CLASSIFICATION PROBLEM

Senem Kumova Metin, Bahar Karaoğlan
2017 Anadolu University Journal of Science and Technology. A : Applied Sciences and Engineering  
In a wide group of languages, the stop words, which have only grammatical roles and not contributing to information content, may be simply exposed by their relatively higher occurrence frequencies.  ...  The experiments are conducted on corpora of an agglutinative language, Turkish, in which the amount of inflection is high and a non-agglutinative language, English, in which the inflection is lower for  ...  k-Nearest Neighbor (k-nm) k-Nearest Neighbor algorithm is a non parametric lazy learning algorithm, originally proposed in [23] , in which when an instance in testing set (whose class is unknown) is to  ... 
doi:10.18038/aubtda.322136 fatcat:g7mkskb4vja7dkjx3y5shxkkde

Open Stylometric System WebSty: Integrated Language Processing, Analysis and Visualisation

Maciej Piasecki, Tomasz Walkowiak, Maciej Eder
2018 Computational Methods in Science and Technology  
WebSty does not require local installation by users, can be used via any web browser, offers rich set-up, and runs on a computing cluster.  ...  The techniques used for feature weighting and text similarity measuring are also concisely overviewed.  ...  Acknowledgements Works funded by the Polish Ministry of Science and Higher Education within CLARIN-PL Research Infrastructure.  ... 
doi:10.12921/cmst.2018.0000007 fatcat:pditv66ns5emzj6xbs4fi46ssu

Genre identification for office document search and browsing

Francine Chen, Andreas Girgensohn, Matthew Cooper, Yijuan Lu, Gerry Filby
2011 International Journal on Document Analysis and Recognition  
Experiments were conducted on the open-set identification of four coarse office document genres: technical paper, photo, slide, and table.  ...  These include selecting features that characterize genre-related information in office documents, examining the utility of text-based features and imagebased features, and proposing a simple ensemble method  ...  Conclusions and future directions are presented in Sect. 7. Related work Many sets of genre categories have been proposed for text genre identification and web genre identification.  ... 
doi:10.1007/s10032-011-0163-7 fatcat:ogby7vevq5h2lf5kwg55emlkb4

Neural Network Based Indian Folk Dance Song Classification Using MFCC and LPC

Malay Bhatt, Tejas Patalia
2017 International Journal of Intelligent Engineering and Systems  
The performances of chosen classifiers, K-Nearest Neighbor, Naïve Bayes and Neural Networks, are compared.  ...  Mel-frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients, generates high dimensional feature vector, reduces dimensionality using Principle Component Analysis (PCA) and  ...  K-Nearest neighbour(kNN) classification [25] The k-nearest neighbor method is originated in 1950s.  ... 
doi:10.22266/ijies2017.0630.19 fatcat:2ahmaswhojbwxdgivnpxn42uj4

One-class classification: taxonomy of study and review of techniques

Shehroz S. Khan, Michael G. Madden
2014 Knowledge engineering review (Print)  
We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.  ...  In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application  ...  Instead of 1-NN, the distance to the kth nearest neighbor or the average of the k distances to the first k neighbors can also be used.  ... 
doi:10.1017/s026988891300043x fatcat:djdcvpij7jhs7gygtrtihq3dia

A scale-free distribution of false positives for a large class of audio similarity measures

Jean-Julien Aucouturier, Francois Pachet
2008 Pattern Recognition  
This approach is the most predominent paradigm to extract high-level descriptions from music signals, such as their instrument, genre or mood, and can also be used to compute direct timbre similarity between  ...  We introduce 2 measures of "hubness", the number of n-occurrences and the mean neighbor angle.  ...  Acknowledgment The authors would like to thanks Anthony Beurivé for helping with the implementation of signal processing algorithms and database metadata management.  ... 
doi:10.1016/j.patcog.2007.04.012 fatcat:d5fkk4u2qrakbcfbjk3q4guani

Content Facets For Individual Information Needs In Media

Elisabeth Lex, Stefanie Lindstaedt, Michael Granitzer, Harald Kosch
2018 Zenodo  
Third, the best features have successfully been used to classify traditional and social media content in both types of content facets.  ...  Several proposed content facets have successfully been implemented in APA Labs, a Web-based framework for faceted search in traditional and social me- dia.  ...  To determine the distance between a test vector and its nearest neighbors, several distance measures have been proposed.  ... 
doi:10.5281/zenodo.1195993 fatcat:ce3ljnthjfhkpir3y4atnnlicy

Content Facets For Individual Information Needs In Media

Elisabeth Lex, Stefanie Lindstaedt, Michael Granitzer, Harald Kosch
2018 Zenodo  
Third, the best features have successfully been used to classify traditional and social media content in both types of content facets.  ...  Several proposed content facets have successfully been implemented in APA Labs, a Web-based framework for faceted search in traditional and social me- dia.  ...  To determine the distance between a test vector and its nearest neighbors, several distance measures have been proposed.  ... 
doi:10.5281/zenodo.1196397 fatcat:udr3736ejbek5lzl34tu4g4ppq

Automatic Text Categorization in Terms of Genre and Author

Efstathios Stamatatos, Nikos Fakotakis, George Kokkinakis
2000 Computational Linguistics  
We present a set of smallscale but reasonable experiments in text genre detection, author identification as well as author verification tasks and show that the performance of the proposed method is better  ...  In this paper we present an approach to text categorization in terms of genre and author for Modern Greek.  ...  Acknowledgement We would like to thank the anonymous CL reviewers for their valuable and insightful comments. Their suggestions have greatly improved an earlier draft of this paper.  ... 
doi:10.1162/089120100750105920 fatcat:ksreq6s6w5ewrgawtxvrbl5tje

MIRages: an account of music audio extractors, semantic description and context-awareness, in the three ages of MIR

Perfecto Herrera Boyer, Xavier Serra, Emilia Gómez
2018 Zenodo  
In the age of feature extractors, we present work on features to describe sounds and music, especially timbre and tonal aspects.  ...  Automatic classification of drum sounds: a comparison of feature selection methods and classification techniques. In C. Anagnostopoulou et al. (Eds), "Music and Artificial Intelligence".  ...  Project PHAROS IST-2006-045035. 21 A demo of such a music recommender/visualization system working on the proposed principles, but taking listening statistics instead of explicitly given preference set  ... 
doi:10.5281/zenodo.2278110 fatcat:uturvyw2gnfzdgtelvtxot3etq

MIRages: an account of music audio extractors, semantic description and context-awareness, in the three ages of MIR

Perfecto Herrera Boyer, Xavier Serra, Emilia Gómez
2018 Zenodo  
In the age of feature extractors, we present work on features to describe sounds and music, especially timbre and tonal aspects.  ...  Automatic classification of drum sounds: a comparison of feature selection methods and classification techniques. In C. Anagnostopoulou et al. (Eds), "Music and Artificial Intelligence".  ...  Project PHAROS IST-2006-045035. 21 A demo of such a music recommender/visualization system working on the proposed principles, but taking listening statistics instead of explicitly given preference set  ... 
doi:10.5281/zenodo.1882316 fatcat:6yhrlcyexrgyhhwayeau2gu7f4

Picasso - to sing, you must close your eyes and draw

Aleksandar Stupar, Sebastian Michel
2011 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11  
PICASSO makes use of genuine samples obtained from first-class contemporary movies.  ...  We have created a large training set consisting of over 40,000 image/soundtrack samples obtained from 28 movies and evaluated the suitability of PICASSO by means of a user study.  ...  Selecting the K-Nearest Neighbors (KNN) means that in a multi-dimensional feature space, for a given feature vector the K nearest feature vectors are selected (cf., [24] for an overview).  ... 
doi:10.1145/2009916.2010012 dblp:conf/sigir/StuparM11 fatcat:2ol5plw6p5fy7e7pds6vdwmnm4
« Previous Showing results 1 — 15 out of 804 results