Analysis and Clustering of Movie Genres

JOURNAL OF COMPUTING, VOLUME 3, I SSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG

16

Analysis and Clustering of Movie Genres Hasan Bulut and Serdar Korukoglu Abstract — — Most of the movies blend a genre with other genres; that is movie directors combine elements from different genres with each other. A movie may blend the love-oriented plot of the romance genre with Western or Science Fiction. Hence a movie may belong to several genres. A movie is also related with some keywords to describe the contents of the movie. These keywords are usually used during search to bring a movie according to user's interest. In this paper, we establish genre keyword sets from movie keywords and use these keyword sets to analyze the proximity of genres with each other. In this study, we use movie data from The Internet Movie Database (IMDB). Genres are classified using hierarchical clustering algorithm and principal component factor analysis (PCFA). The study shows us which genres are mostly used together in a movie. We show that results of the two analyses support each other. Index Terms — Hierarchical clustering, Internet Movie Database, movie genres, principal component factor analysis

——————————  ——————————

1 INTRODUCTION

M

ovies are part of almost everybody’s life to fulfill his/her entertainment needs and hence constitute a large portion of the entertainment industry. Several websites host movie metadata and provide users the facility to search and find movies of his/her interest. The Internet Movie Database (IMDB) is a popular site cataloging almost every movie ever made. It is an excellent online database to find detailed information about movies, TV series and videos. The IMDB contains information such as genre, keywords, year, language, user ratings and many other features related to those movies and videos. However, IMDB contains huge amount of those data that should be analyzed by researchers. Movies are classified into a number of genres to help users to direct their search to some specific categories. However, most of the movies blend a genre with other genres; that is movie directors combine elements from different genres with each other. A movie may blend the love-oriented plot of the romance genre with Western or Science Fiction. Hence a movie may belong to several genres. [1] and [2] classify movies into four genres based on the basis of computable visual cues; Comedies, Action, Dramas or Horror films. [3] presents a method of movie genre categorization based on scene classification of movie trailers. Similar to the work presented in [1], they also classify movies into four categories Comedies, Action, Dramas or Horror. [4] characterizes the measurable traits of the musical scores utilized in Comedies, Action, Dramas and Horror movie genres to determine the feature categories carrying the most valuable information distinguishing them in a broad sense. [5] gives a global overview of the entire movie and ———————————————— 



H. Bulut is with the Computer Engineering Department, Ege University, Bornova, Izmir 35100 Turkey S. Korukoglu is with the Computer Engineering Department, Ege University, Bornova, Izmir 35100 Turkey

actor space. It visualizes all movies as well as major coactor relationships. [6] presents a case study for the visualization and analysis of large and complex temporal multivariate networks derived from the Internet Movie Database (IMDB). [7] uses data mining techniques to analyze the factors contributing to the rating of a movie. [8] predicts movie grosses using regression and k-nearest neighbor models on IMDB data and news data. [9] combines recommender systems with information search tools for better search and browsing. They use a collaborative filtering algorithm to generate personal item authorities for each user and combine them with item proximities. proximi ties. [10] develops models and algorithms for predicting the helpfulness of reviews. The study aims to find the most helpful reviews residing among the large amount of low quality reviews. In this paper, we present a novel approach that analyzes the proximity of movie genres with each other and discover the most related genre pairs and triples. In order to achieve this, we have established genre keyword sets from movie keywords. Genres may have common keywords. We have classified genres using hierarchical clustering algorithm. Also we have compared it with the principal component factor analysis results among genres. The study shows us which genres are mostly used together. We have shown that results of the two analyses are close to each other. The remainder of the paper is organized as follows: Section 2 introduces data used from The Internet Movie Database (IMDB). Section 3 explains how genre keyword sets are constructed from movie keyword sets. Section 4, first presents the keyword distributions of genres. Then, hierarchical clustering method is applied on the movie data and the closest genre pairs and triples are discovered. Principal component factor analysis is performed on the same data and two analyses are compared with each other. Finally, in Section 5, we conclude that the results of the two analyses support each other.

JOURNAL OF COMPUTING, VOLUME 3, I SSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG

2 THE INTERNET MOVIE DATABASE (IMDB) The Internet Movie Database (IMDB) is an excellent online database to find detailed information about movies, TV series and videos. The IMDB contains information such as genre, keywords, year, language, user ratings and many other features related to those movies and videos. Undoubtedly, the vast amount of IMDB data contains much valuable information which needs to be researched. The IMDB is available in a number of inconsistently structured text files, which is laid out to be humanreadable, not machine-readable. The format of the data makes it difficult to directly use the source data for information extraction. Hence, the raw data needs to be preprocessed or transformed into another suitable format. The IMDB data contains 49 separate text files. The common factor linking the information in these files is the title of the movie. The production year in parenthesis is appended to the title of the movie to account for different versions of the title. Some of the titles may include letters TV, V or VG in parenthesis as well to indicate that the title is a TV series, video or video game respectively, i.e. Sand (2001) and Sand (2001) (V). Also, if there are multiple movies with the same title in the same year, roman numbers are appended to the year, i.e. Sand (2010/I) and Sand (2010/II). Each file provides information related to a separate feature such as genres are given in genre.list while keywords are given in keywords.list file. The convention used in these files are as follows: genre.list file provides genre information of titles in |<genre> format, keywords.list file provides keyword information of titles in <title>|< keyword > format, etc. on each seperate line. If there are more than one genre or keyword for a title, then the title is repeated for that genre or keyword, i.e. <titleA>|<genre1>, <titleA>|<genre2>, etc. However, files may contain some text at the beginning and the data may sometime comprise some errors. Also, the spacing between titles and related information is not same and not all values were available for each line. The non-standardized structure of the files requires parsing them in different ways. For our research purposes we have used the following files: movies.list, genres.list, keywords.list and language.list. As their names indicate, movies.list file contains <title>|< year> pair, genres.list file contains <title>|< genre> pair, keywords.list file contains <title>|< keyword> pair and language.list file contains <title>|< language> pair on every line. After processing these files, titles are linked to their genres, keywords and language. 3 CONSTRUCTING GENRE KEYWORD SETS Movie data contains a number of keywords to describe to movie. A movie may belong to several genres. Therefore, the movie keywords are included into genre keyword sets that are specified for that movie. For instance, let’s consider the following movies with related keyword and 17 genre sets: Movie m1 Keywordsm1= {k1, k2, k3, k4, k5} Genres m1= {g1, g2, g3} Movie m2 Keywordsm2= {k2, k3, k7} Genres m2= {g1, g2} Movie m3 Keywordsm3= {k1, k6, k8} Genres m3= {g2, g3} For the above example, we establish the genre keyword sets as follows: Step 1: Combine all keywords from movies which contain the genre in its list. Keywordsg1= {k1 , k2 , k2 , k3 , k3 , k4 , k5 , k7} Keywordsg2= {k1, k1 , k2, k2 , k3 , k3 , k4 , k5 , k6 , k7 , k8} Keywordsg3= {k1 , k1 , k2 , k3 , k4 , k5 , k6 , k8} Step 2: Associate a weight with each keyword in the keyword set. Weight values are obtained by normalizing the total weight of the keyword set to 1. Keyword weight is assigned a value proportion the number of keyword repetitions within the keyword set. Keywordsg1 = {<k1 , 0.125> , <k2 , 0.25> , <k3 , 0.25> , <k4 , 0.125>, <k5 , 0.125>, <k7 , 0.125>} Keywordsg2 = {<k1 , 0.1818> , <k2 , 0.1818> , <k3 , 0.1818> , <k4 , 0.0909>, <k5 , 0.0909>, <k6 , 0.0909>, <k7 , 0.0909>, <k8 , 0.0909>} Keywordsg3 = {<k1 , 0.25>, <k2 , 0.125>, <k3 , 0.125>, <k4 , 0.125>, <k5 , 0.125>, <k6 , 0.125>, <k8 , 0.125>} Step 3: After combining all movie keywords for all genres a matrix is formed where rows represent keywords and columns represent genres. For the above example the matrix is shown in TABLE 1. TABLE 1 KEYWORD-GENRE MATRIX FOR THE EXAMPLE k1 k2 k3 k4 k5 k6 k7 k8 g 1 0.125 0.25 0.25 0.125 0.125 0.0 0.125 0.0 g 2 0.1818 0.1818 0.1818 0.0909 0.0909 0.0909 0.0909 0.0909 g 3 0.25 0.125 0.125 0.125 0.125 0.125 0.0 0.125 4 ANALYSIS OF IMDB DATA We have chosen movies with English language titles between years 2006 and 2010, a five year period, which makes a total of 48483 titles, with 27 genres and 19561 keywords. The distribution of the number of titles per genre is shown in Fig. 1. In Fig. 1, the genres are sorted in decreasing order of number of movies they have. The first ten genres (Short, Drama, Adult, Comedy, Documentary, Thriller, Horror, Action, Romance and Crime) constitute almost %80 of the movies. JOURNAL OF COMPUTING, VOLUME 3, I SSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG 18 Distribution of the Number of Movies per Genre 18000 16000 s e 14000 i v o 12000 M f 10000 o r 8000 e b m 6000 u N 4000 2000 0 i y n t y y V r s l t t y y r r n e e l y e i c y F r a l r h r w n w e o i r s - r i a w a u d r o c m o o l s o m c o r r i o a l e t o T u u t a i i e h n i m p t r t t h a d e t i t p e c s a W s h t r o c a r a n a y s n h s n S r A m e S i t N u S C F e M a S y m r l - e S D o T H A m g H i M k v F M i a o C m o l W e n i d e R a u m A B R A c T a o G D Genres Fig. 1. Distribution of the number number of movies per genre Keyword Distribution of Genres 0 Short 500 500 1000 0 Drama Adult 4000 3000 10000 2000 2000 1500 5000 1000 0 0 Thriller 1000 1000 0 Act ion 400 400 500 800 800 0 Crime 800 500 500 400 250 250 0 Famil y 0 250 250 0 Adventure Doc ument ary 1600 Romance 500 500 0 1000 0 Horror 500 500 500 500 500 500 Comedy Fantas y 400 0 Sci-Fi Mys t ery 400 400 400 400 200 200 250 0 0 Animat ion 400 200 200 200 0 0 Biography His tory 300 300 300 150 150 0 0 0 500 1000 0 War Mus ical 100 100 200 200 200 200 200 150 100 100 0 0 0 500 500 1000 Keywords Fig. 2. Keyword distribution of genres 50 0 0 500 500 1000 JOURNAL OF COMPUTING, VOLUME 3, I SSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG Keywords are sorted in decreasing order and the corresponding values (number of keywords) are plotted for 20 genres. The first 1000‐keyword distributions for genres are given in Fig. 2. As shown in section 3, the frequency of a keyword may differ from genre to genre. It is observed that the distributions are positively skewed. 4.1 The Distance and the Pearson Correlation Coefficient Values between Genres The distance and the Pearson correlation coefficient values between genres are computed, after determining the keyword weight values for genres. The result is given in Table 2. The lower left triangle shows the distance between genres and the upper right triangle shows the Pearson Correlation coefficients. All of the values in Table 2 are significant at least at 1% level (p<0.01). As seen from Table 2, the smallest correlation value is between News and Adult which is 0.018. The largest correlation is between Thriller and Mystery, which is 0.967. There are 9 pairs which correlation coefficient values are above 0.9: Drama-Comedy, Drama-Romance, 19 Documentary-Biography, Thriller-Horror, ThrillerAction, Thriller-Crime, Thriller-Mystery, Crime-Mystery and Reality-TV – Game-Show. There are also 49 genre pairs which correlation coefficient values are between 0.8 and 0.9. Adult has the least correlation with all other genres. The highest correlation value of Adult is 0.186, with Romance and the lowest correlation value of Adult is 0.018, with News. Most of the genres are highly intercorrelated with each others and creates a methodological problem for analysis of the data. Correlation coefficients between genre pairs are also given as matrix plot in Fig. 3. Matrix plot provides visualization of genre correlations with each other. As seen from Fig. 3, the variations between genre pairs display different features for each pair. Smooth linear patterns are easily identified between some genre pairs having high correlation values. It is also interesting that the graph patterns of Adult with other genres are quite different than remaining genre pairs. TABLE 2 DISTANCE MATRIX (LOWER LEFT TRIANGLE) AND CORRELATION MATRIX (UPPER LEFT TRIANGLE) BETWEEN GENRES Genres Short Drama t r o h S a m a r D t l u d A y d e m o C y r a t n e m u c o D r e l l i r h T r o r r o H n o i t c A e c n a m o R e m i r C y l i m a F e r u t n c i s e v u d M A y s a t n a F i F i c S n o i y t r a e t m s i y n M A t r o p S y h p a r g o i B y r o t s i H V T y t s i l r w a a e e R W N w o l h a S c i s k u l a M T w o h n S r e e t s e m a W G 0.840 0.840 0.071 0.071 0.820 0.820 0.836 0.836 0.655 0.655 0.657 0.657 0.679 0.679 0.737 0.737 0.676 0.676 0.829 0.829 0.587 0.587 0.669 0.669 0.836 0.836 0.724 0.724 0.704 0.704 0.722 0.722 0.553 0.553 0.809 0.809 0.779 0.779 0.206 0.206 0.639 0.639 0.725 0.725 0.848 0.848 0.269 0.269 0.671 0.671 0.140 0.140 0.160 0.160 0.152 0.152 0.917 0.917 0.786 0.786 0.874 0.874 0.786 0.786 0.832 0.832 0.934 0.934 0.880 0.880 0.841 0.841 0.746 0.746 0.688 0.688 0.884 0.884 0.813 0.813 0.898 0.898 0.707 0.707 0.688 0.688 0.884 0.884 0.774 0.774 0.311 0.311 0.691 0.691 0.546 0.546 0.858 0.858 0.318 0.318 0.723 0.723 0.206 0.206 Adult Comedy Documentary 0.929 0.929 0.848 0.848 Thriller Horror Action 0.345 0.126 0.847 0.208 0.403 0.179 0.179 0.072 0.072 0.153 0.153 0.157 0.157 0.112 0.112 0.186 0.186 0.137 0.137 0.031 0.031 0.148 0.148 0.072 0.072 0.103 0.103 0.113 0.113 0.133 0.133 0.057 0.057 0.085 0.085 0.090 0.090 0.057 0.057 0.044 0.044 0.045 0.045 0.018 0.018 0.090 0.090 0.031 0.031 0.053 0.053 0.020 0.020 0.180 0.180 0.083 0.083 0.821 0.821 0.780 0.780 0.792 0.792 0.756 0.756 0.791 0.791 0.892 0.892 0.794 0.794 0.812 0.812 0.736 0.736 0.708 0.708 0.860 0.860 0.798 0.798 0.816 0.816 0.756 0.756 0.676 0.676 0.831 0.831 0.699 0.699 0.361 0.361 0.605 0.605 0.527 0.527 0.858 0.858 0.447 0.447 0.665 0.665 0.263 0.263 0.164 0.164 0.214 0.214 0.928 0.928 0.220 0.220 0.597 0.597 0.569 0.569 0.649 0.649 0.676 0.676 0.620 0.620 0.818 0.818 0.589 0.589 0.763 0.763 0.730 0.730 0.684 0.684 0.636 0.636 0.682 0.682 0.646 0.646 0.902 0.902 0.873 0.873 0.351 0.351 0.708 0.708 0.746 0.746 0.796 0.796 0.456 0.456 0.621 0.621 0.248 0.248 0.93 0.930 0 0.91 0.916 6 0.773 0.773 0.95 0.958 8 0.63 0.636 6 0.74 0.743 3 0.52 0.526 6 0.80 0.803 3 0.84 0.843 3 0.96 0.967 7 0.60 0.606 6 0.57 0.575 5 0.70 0.702 2 0.62 0.624 4 0.25 0.259 9 0.59 0.595 5 0.36 0.367 7 0.67 0.676 6 0.25 0.257 7 0.72 0.727 7 0.17 0.175 5 0.343 0.343 0.214 0.214 0.843 0.843 0.244 0.244 0.431 0.431 0.070 0.070 0.849 0.849 0.686 0.686 0.845 0.845 0.584 0.584 0.682 0.682 0.498 0.498 0.778 0.778 0.817 0.817 0.893 0.893 0.593 0.593 0.511 0.511 0.635 0.635 0.570 0.570 0.231 0.231 0.533 0.533 0.361 0.361 0.651 0.651 0.244 0.244 0.687 0.687 0.158 0.158 0.321 0.321 0.168 0.168 0.888 0.888 0.209 0.209 0.351 0.351 0.084 0.084 0.151 0.151 0.723 0.723 0.896 0.896 0.685 0.685 0.783 0.783 0.546 0.546 0.831 0.831 0.897 0.897 0.871 0.871 0.679 0.679 0.630 0.630 0.714 0.714 0.679 0.679 0.281 0.281 0.662 0.662 0.423 0.423 0.692 0.692 0.298 0.298 0.761 0.761 0.198 0.198 Romance Crime Family 0.263 0.263 0.066 0.066 0.814 0.814 0.108 0.108 0.324 0.324 0.227 0.227 0.314 0.314 0.277 0.277 Music Adventure 0.41 0.413 3 0.25 0.254 4 0.8 0.852 52 0.26 0.264 4 0.4 0.411 11 0.25 0.257 7 0.3 0.318 18 0.21 0.217 7 0.2 0.298 98 0.28 0.288 8 0.32 0.324 4 Fantasy Sci-Fi Mystery 0.16 0.165 5 0.11 0.116 6 0.8 0.897 97 0.14 0.140 0 0.2 0.270 70 0.19 0.197 7 0.2 0.222 22 0.16 0.169 9 0.1 0.187 87 0.23 0.234 4 0.14 0.147 7 0.22 0.227 7 0.35 0.356 6 Animation Sport Biography 0.27 0.278 8 0.29 0.293 3 0.9 0.943 43 0.24 0.244 4 0.3 0.318 18 0.39 0.394 4 0.4 0.407 07 0.32 0.321 1 0.3 0.374 74 0.41 0.411 1 0.20 0.203 3 0.34 0.340 0 0.41 0.410 0 0.14 0.148 8 0.24 0.240 0 0.34 0.345 5 History Reality-TV 0.221 0.221 0.226 0.226 0.943 0.943 0.301 0.301 0.127 0.127 0.376 0.376 0.430 0.430 0.321 0.321 0.343 0.343 0.351 0.351 0.234 0.234 0.407 0.407 0.365 0.365 0.273 0.273 0.312 0.312 0.345 0.345 0.385 0.385 0.409 0.409 0.122 0.122 War News 0.361 0.361 0.309 0.309 0.955 0.955 0.395 0.395 0.292 0.292 0.405 0.405 0.467 0.467 0.338 0.338 0.410 0.410 0.404 0.404 0.363 0.363 0.440 0.440 0.497 0.497 0.350 0.350 0.346 0.346 0.397 0.397 0.465 0.465 0.489 0.489 0.263 0.263 0.134 0.134 0.798 0.798 Musical Talk-Show Western 0.152 0.152 0.142 0.142 0.910 0.910 0.142 0.142 0.204 0.204 0.324 0.324 0.349 0.349 0.308 0.308 0.187 0.187 0.310 0.310 0.176 0.176 0.361 0.361 0.194 0.194 0.163 0.163 0.283 0.283 0.277 0.277 0.275 0.275 0.386 0.386 0.164 0.164 0.259 0.259 0.709 0.709 0.383 0.383 0.404 0.404 Game-Show 0.860 0.860 0.794 0.794 0.980 0.980 0.737 0.737 0.752 0.752 0.825 0.825 0.842 0.842 0.802 0.802 0.82 0.820 0 0.830 0.830 0.751 0.751 0.795 0.795 0.652 0.652 0.807 0.807 0.774 0.774 0.811 0.811 0.761 0.761 0.573 0.573 0.780 0.780 0.831 0.831 0.078 0.078 0.867 0.867 0.620 0.620 0.78 0.785 5 0.337 0.337 0.837 0.837 0.768 0.768 0.754 0.754 0.702 0.702 0.643 0.643 0.813 0.813 0.710 0.710 0.803 0.803 0.626 0.626 0.641 0.641 0.796 0.796 0.657 0.657 0.288 0.288 0.590 0.590 0.445 0.445 0.813 0.813 0.281 0.281 0.605 0.605 0.180 0.180 0.32 0.324 4 0.12 0.120 0 0.8 0.864 64 0.20 0.206 6 0.3 0.380 80 0.04 0.042 2 0.1 0.155 55 0.10 0.104 4 0.2 0.232 32 0.64 0.643 3 0.71 0.712 2 0.54 0.548 8 0.76 0.766 6 0.79 0.791 1 0.94 0.947 7 0.58 0.589 9 0.58 0.585 5 0.72 0.729 9 0.64 0.649 9 0.2 0.257 57 0.59 0.596 6 0.4 0.404 04 0.69 0.690 0 0.26 0.263 3 0.71 0.710 0 0.17 0.170 0 0.17 0.171 1 0.15 0.159 9 0.9 0.969 69 0.18 0.188 8 0.1 0.182 82 0.36 0.364 4 0.4 0.416 16 0.31 0.315 5 0.2 0.246 46 0.35 0.357 7 0.67 0.676 6 0.66 0.661 1 0.85 0.853 3 0.71 0.719 9 0.68 0.686 6 0.79 0.797 7 0.64 0.649 9 0.84 0.840 0 0.76 0.766 6 0.3 0.347 47 0.63 0.637 7 0.6 0.661 61 0.82 0.824 4 0.36 0.363 3 0.64 0.648 8 0.24 0.249 9 0.51 0.510 0 0.77 0.773 3 0.76 0.765 5 0.74 0.740 0 0.66 0.660 0 0.55 0.559 9 0.65 0.655 5 0.59 0.593 3 0.3 0.302 02 0.56 0.560 0 0.3 0.369 69 0.63 0.639 9 0.30 0.303 3 0.61 0.619 9 0.20 0.205 5 0.33 0.331 1 0.31 0.312 2 0.9 0.928 28 0.29 0.292 2 0.2 0.237 37 0.47 0.474 4 0.5 0.502 02 0.45 0.454 4 0.3 0.357 57 0.45 0.452 2 0.33 0.339 9 0.49 0.490 0 0.64 0.644 4 0.57 0.572 2 0.56 0.564 4 0.59 0.590 0 0.55 0.557 7 0.76 0.769 9 0.63 0.635 5 0.4 0.408 08 0.50 0.503 3 0.5 0.551 51 0.80 0.806 6 0.45 0.457 7 0.50 0.502 2 0.34 0.348 8 0.86 0.868 8 0.84 0.840 0 0.85 0.852 2 0.61 0.610 0 0.79 0.797 7 0.72 0.727 7 0.2 0.281 81 0.65 0.650 0 0.5 0.529 29 0.83 0.837 7 0.31 0.316 6 0.72 0.720 0 0.19 0.193 3 0.27 0.276 6 0.18 0.187 7 0.8 0.887 87 0.20 0.202 2 0.3 0.316 16 0.15 0.157 7 0.1 0.183 83 0.10 0.103 3 0.2 0.290 90 0.20 0.209 9 0.28 0.281 1 0.23 0.235 5 0.42 0.428 8 0.13 0.132 2 0.83 0.836 6 0.76 0.760 0 0.58 0.588 8 0.71 0.717 7 0.68 0.688 8 0.3 0.304 04 0.65 0.654 4 0.4 0.470 70 0.71 0.717 7 0.32 0.325 5 0.70 0.706 6 0.22 0.226 6 0.29 0.296 6 0.10 0.102 2 0.8 0.867 67 0.18 0.184 4 0.3 0.364 64 0.03 0.033 3 0.1 0.107 07 0.12 0.129 9 0.1 0.197 97 0.05 0.053 3 0.31 0.314 4 0.26 0.260 0 0.43 0.436 6 0.16 0.160 0 0.16 0.164 4 0.65 0.655 5 0.58 0.585 5 0.74 0.740 0 0.65 0.655 5 0.2 0.276 76 0.60 0.603 3 0.4 0.403 03 0.72 0.723 3 0.28 0.282 2 0.71 0.713 3 0.18 0.189 9 0.53 0.530 0 0.69 0.698 8 0.61 0.615 5 0.3 0.317 17 0.53 0.535 5 0.4 0.460 60 0.72 0.725 5 0.40 0.409 9 0.58 0.582 2 0.23 0.239 9 0.44 0.447 7 0.31 0.312 2 0.9 0.915 15 0.32 0.324 4 0.3 0.354 54 0.42 0.425 5 0.4 0.489 89 0.37 0.370 0 0.3 0.359 59 0.41 0.415 5 0.35 0.351 1 0.44 0.441 1 0.44 0.443 3 0.39 0.390 0 0.41 0.412 2 0.41 0.415 5 0.47 0.470 0 0.66 0.667 7 0.59 0.591 1 0.5 0.524 24 0.51 0.511 1 0.5 0.512 12 0.61 0.614 4 0.44 0.442 2 0.49 0.498 8 0.42 0.427 7 0.19 0.191 1 0.11 0.116 6 0.9 0.910 10 0.16 0.169 9 0.0 0.098 98 0.29 0.298 8 0.3 0.365 65 0.28 0.286 6 0.2 0.204 04 0.27 0.271 1 0.16 0.160 0 0.34 0.345 5 0.23 0.231 1 0.20 0.203 3 0.28 0.283 3 0.26 0.260 0 0.30 0.302 2 0.33 0.333 3 0.87 0.878 8 0.3 0.325 25 0.73 0.737 7 0.6 0.673 73 0.83 0.836 6 0.42 0.425 5 0.66 0.661 1 0.22 0.220 0 0.250 0.250 0.86 0.866 6 0.748 0.748 0.741 0.741 0.300 0.300 0.660 0.660 0.169 0.169 0.794 0.794 0.689 0.689 0.956 0.956 0.639 0.639 0.649 0.649 0.741 0.741 0.769 0.769 0.719 0.719 0.712 0.712 0.743 0.743 0.653 0.653 0.698 0.698 0.592 0.592 0.719 0.719 0.696 0.696 0.724 0.724 0.683 0.683 0.476 0.476 0.675 0.675 0.750 0.750 0.202 0.202 0.409 0.409 0.291 0.291 0.707 0.707 0.230 0.230 0.922 0.922 0.561 0.561 0.617 0.617 0.227 0.227 0.574 0.574 0.133 0.133 0.275 0.275 0.454 0.454 0.982 0.982 0.473 0.473 0.254 0.254 0.633 0.633 0.639 0.639 0.577 0.577 0.555 0.555 0.596 0.596 0.339 0.339 0.631 0.631 0.449 0.449 0.471 0.471 0.530 0.530 0.597 0.597 0.540 0.540 0.488 0.488 0.327 0.327 0.252 0.252 0.591 0.591 0.439 0.439 0.596 0.596 0.402 0.402 0.469 0.469 0.380 0.380 0.335 0.335 0.636 0.636 0.215 0.215 0.73 0.731 1 0.68 0.682 2 0.9 0.970 70 0.55 0.553 3 0.5 0.544 44 0.74 0.743 3 0.7 0.756 56 0.70 0.702 2 0.7 0.719 19 0.73 0.737 7 0.63 0.637 7 0.69 0.697 7 0.54 0.543 3 0.68 0.684 4 0.67 0.675 5 0.71 0.718 8 0.59 0.591 1 0.55 0.558 8 0.57 0.575 5 0.70 0.700 0 0.2 0.293 93 0.77 0.773 3 0.5 0.598 98 0.66 0.665 5 0.23 0.237 7 0.66 0.663 3 0.32 0.329 9 0.27 0.277 7 0.9 0.947 47 0.33 0.335 5 0.3 0.379 79 0.27 0.273 3 0.3 0.313 13 0.23 0.239 9 0.3 0.395 95 0.29 0.290 0 0.35 0.352 2 0.38 0.381 1 0.49 0.498 8 0.28 0.280 0 0.29 0.294 4 0.28 0.287 7 0.41 0.418 8 0.50 0.502 2 0.33 0.339 9 0.34 0.340 0 0.7 0.770 70 0.42 0.426 6 0.5 0.531 31 0.36 0.364 4 0.76 0.763 3 0.16 0.163 3 JOURNAL OF COMPUTING, VOLUME 3, I SSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG 20 Matrix Plot of Genre Correlations Short 0,9 0,6 0,3 1,0 0,5 0,0 0,9 0,6 0,3 1,0 0,5 0,0 0,9 0,6 0,3 0,9 0,6 0,3 1,0 0,5 0,0 0,9 0,6 0,3 0,9 0,6 0,3 1,0 0,5 0,0 0,9 0,6 0,3 1,0 0,5 0,0 1,0 Drama Adult Comedy Documentary Thriller Horror Action Romance Crime Family Adventure Fantasy 0,5 0,0 0,9 0,6 0,3 1,0 Sci-Fi Mystery 0,5 0,0 1,0 Animation 0,5 0,0 1,0 Biography 0,5 0,0 1,0 History 0,5 0,0 1,0 War 0,5 0,0 0 5 0 3 6 9 0 5 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , Musical 0 3 6 9 0 , 3 6 9 3 6 9 0 5 0 3 6 9 0 5 , , 0 0 , 5 1 , 0 0 3 , 0 6 , 0 9 , 0 , 3 0 6 , 0 9 , 0 , 0 0 , 5 1 , 0 1 , 0 , 0 , 0 0 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 00 5 0 3 6 9 0 5 1 0 , , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 0 5 0 0 5 , 0 0 5 1 0 , , 0 , 1 0 ,, 0 , 1 0 , 0 , 0 1 , Fig. 3. Matrix plot of genre correlations Generally, as a measure of internal consistency, a statistic called Cronbach's alpha is used. The values 0.7 or 0.75 are often used as cutoff values for Cronbach’s alpha and thus for the reliability of the test [11]. For the above keyword distributions, the overall Cronbach’s alpha value is computed as 0.8391, which is good considering the the cutoff value of 0.75. 4.2 Hierarchical Clustering Clustering of Genres Considering that there are 27 movie genres, each represented with an array of length 19561, the data that need to be clustered would be too few. In this case, in order to discover the genre relationships, hierarchical clustering would be an appropriate method. Since, hierarchical clustering organizes data into the hierarchical structure based on the proximity of data with each other. Agglomerative clustering, a widely used method for hierarchical clustering, starts with N singleton clusters, each containing a single data, and performs a series of merge operations at each step until one cluster is left. The result is usually depicted by a dendrogram, which visualizes the potential clustering structures. By cutting the dendrogram at different levels, different clustering structures can be obtained. When combining a pair of clusters at each level, we have used complete linkage algorithm as distance algorithm between two clusters. Complete linkage algorithm is considered effective for small clusters. It ensures that all items are within a maximum distance of each other, that is, it uses the largest distance between items of the clusters to define inter-cluster distance. TABLE 3 AMALGAMATION STEPS FOR HIERARCHICAL CLUSTERING OF GENRES Number Step of clusters Similarity Distance Clusters New level level joined cluster Number of obs. in new cluster 1 26 98.3683 0.032633 6 16 6 2 2 25 97.3355 0.053289 6 10 6 3 3 24 96.7134 0.065732 2 9 2 2 4 23 96.1016 0.077968 21 27 21 2 5 22 95.1015 0.097971 5 19 5 2 6 21 94.8401 0.103199 8 15 8 2 7 20 94.5857 0.108287 2 4 2 3 8 19 93.6382 0.127235 5 20 5 3 9 18 92.6514 0.146972 11 14 11 2 10 17 92.3969 0.152063 1 24 1 2 11 16 92.2667 0.154667 6 7 6 4 12 15 91.1781 0.176439 1 11 1 4 13 14 89.5475 0.209051 6 8 6 6 14 13 86.8286 0.263428 1 2 1 7 15 12 85.3866 0.292267 5 22 5 4 16 11 84.3267 0.313467 6 26 6 7 17 10 83.1609 0.336781 21 25 21 3 18 9 83.0035 0.339929 12 17 12 2 19 8 82.1727 82.1727 0.356546 1 13 1 8 20 7 79.1008 79.1008 0.417984 6 12 6 9 21 6 78.0444 78.0444 0.439113 5 23 5 5 22 5 77.6394 77.6394 0.447211 1 18 1 9 23 4 74.88 0.5024 1 6 1 18 24 3 68.069 0.638619 1 5 1 23 25 2 56.6527 0.866946 1 21 1 26 26 1 50.9082 0.981835 1 3 1 27 JOURNAL OF COMPUTING, VOLUME 3, I SSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG Performing hierarchical clustering with complete linkage on IMDB data set produces amalgamation steps given in Table 3. The cluster Ids used in Table 3 is as follows: Short(1), Drama(2), Adult(3), Comedy(4), Documentary(5), Thriller(6), Horror(7), Action(8), Romance(9), Crime(10), Family(11), Adventure(12), Music (13), Fantasy(14), Sci-Fi(15), Mystery(16), Animation(17), Sport(18), Biography(19), History(20), Reality-TV(21), War(22), News(23), Musical(24), Talk-Show(25), Western(26) and Game-Show(27). At step 1, Thriller(6) and Mystery(16) form a cluster. Notice from Table 2 that, these two genres are the closest pair, with a distance of 0.033 (and the largest correlation value of 0.967) among all genre pairs. At step 2, we observe that Crime(10) is merged with Thriller(6)Mystery(16) pair to form another cluster. Also, as seen from Table 2, Thriller(6) and Crime(10) is the second closest pair, with a distance of 0.042, among all genre pairs. The result cluster contains Thriller, Mystery and Crime. When we follow steps in Table 3, we observe the following results: Drama(2) and Romance(9) is merged at step 3 and later at step 7 they are merged with Comedy(4) forming a cluster composed of Drama, Romance and Comedy genres. At step 4, Reality-TV(21) and GameShow(27) is merged into a cluster, which is then merged with Talk-Show(25) at step 17, forming a cluster composed of Reality-TV, Game-Show and Talk-Show genres . Documentary(5) and Biography(19) are merged at step 5, which is later merged with History(20) at step 8 forming another cluster containing Documentary, Biography and History genres. At step 6 Action(8) is 21 merged with Sci-Fi(15), at step 9 Family(11) is merged with Fantasy(14), at step 10 Short(1) is merged with Musical(24) and step 18 Adventure(12) is merged with with Animation(17). These groupings show us which genre pairs or triples are mostly blended together in a movie. Also notice that Adult(3) is merged at the final stage to form the root cluster. This shows that Adult genre cannot be correlated with other movie genres. The corresponding dendrogram for Table 3 is also shown in Fig. 4. The cluster formations explained above can be visually followed from Fig. 4. Applying a cutoff value between 0.45 – 0.50 to the dendrogram in Fig. 4 (shown as dashed line) results in 5 genre clusters. These clusters are given in Table 4. TABLE 4 GENRE CLUSTERS OBTAINED FROM HIERARCHICAL CLUSTERING Cluster 1 Short, Drama, Comedy, Romance, Family, Music, Fantasy, Sport, Musical Cluster 2 Thriller, Thriller, Horror, Action, Crime, Adventure, Sci-Fi, Mystery, Animation, Western Cluster 3 Documentary, Biography, History, War, News Cluster 4 Reality-TV Reality-T V, Talk-Show, Game-Show Game-Sho w Cluster 5 Adult Complete Linkage Dendrogram for Genres 0,98 e 0,65 c n a t s i D 0,33 0,00 l y y a e y c t r y e r n i n e n y y y r s V t t r a l i r e r F r h r a w T w w l s o r r i c d o o u s l o i c i a o r i o o e i m r t i t a p t o u t t n e u p l i d t h s m t m t c e W e h h A a a r r c a y s a s o s n a n r n m S i N i t S u y C A S e e m e r a D m o M h l S - S g H T M H i m o M F F v a e k o C l W n i d e R a u B A A c R m a T o G D Variables Fig. 4. Complete linkage dendrogram for genres JOURNAL OF COMPUTING, VOLUME 3, I SSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG 4.3 Principal Component Factor Analysis for IMDB Data As an alternative to hierarchical clustering method, we also applied principal component factor analysis (PCFA) to the IMDB data. Principal component factor analysis is the technique to reduce a large number of variables to smaller random quantities called factors. The main purpose of applying PCFA here is to compare the relationship between hierarchical clustering results and the results of PCFA. Factor loadings are computed using the covariance matrix obtained from IMDB data. Factor loading pattern of five factors are given in Table 5. PCFA have have identified five factors with 84.6% explained variance among IMDB genres. 22 instead of cluster 2 in hierarchical clustering and Sport is placed into the cluster 4 instead of cluster 1 in hierarchical clustering. Hence, classification of 24 out of 27 genres (88.9%) matches with each other. It is interesting that factor 5 could be identified as Adult factor and this genre was the most distinct cluster to others in hierarchical clustering. TABLE 6 GENRE GROUPS OBTAINED FROM PRINCIPAL COMPONENT FACTOR ANALYSIS Short, Drama, Comedy Comedy,, Romance, Family, Family, Cluster 1 Animation, Musical, Music, Fantasy, Animation, (Factor 2) Western Cluster 2 Thriller, Horror, Action, Crime, Adventure, (Factor 1) Sci-Fi, Mystery TABLE 5 GENRE FACTOR LOADINGS BY PCFA Variable ariable Short Drama Adult Comed Documentar Thriller Horror Action Romance Crime Famil Adventure Music Fantas Sci-Fi M ster Animation S ort Bio ra h Histor Real Realit it -TV -TV War News Musical Talk-Show Western Game-Show Variance Cumulative variance (%) Cluster 3 Documentary, Biography, History, War, (Factor 3) News Factor Factor 1 Factor Factor 2 Factor Factor 3 Factor Factor 4 Factor Factor 5 -0.075 0.174 0.074 -0.085 0.017 0.046 0.079 -0.048 -0.032 -0.027 -0.031 -0.052 0.041 -0.007 -0.958 -0.002 0.219 -0.146 0.004 -0.069 -0.107 0.070 0.207 -0.012 -0.027 0.226 -0.169 -0.041 0. 0.005 -0 0..003 0.210 -0.135 -0.060 0. 0.001 0.000 0.197 -0.169 0.011 0. 0.011 0.055 0.009 0.203 -0.136 -0.034 -0.117 0.197 -0.159 -0.005 0. 0.000 -0 0..008 -0.079 0.233 -0.016 -0.025 0.080 0.113 0.023 -0.110 0.016 0.000 -0.131 0.293 -0.067 0.037 -0.033 0.036 0.177 -0.120 -0.038 0.065 0.135 -0.050 -0.031 0. 0.007 0.064 0.184 -0.085 -0.067 -0 .0 .001 0.013 -0.038 0.328 -0.215 -0.003 0.133 0.015 -0.033 0.046 0.126 -0.038 -0.060 0.077 0.137 -0.026 -0.036 -0.031 -0.179 0.408 -0.052 -0.020 0.010 -0.109 -0.025 0.374 -0.006 0.041 -0.309 0.436 -0.047 -0.006 -0.127 -0.122 0.412 0.050 -0.024 -0.086 0.302 -0.083 -0.048 -0.006 -0.053 0.064 -0.087 0.294 0.020 0.131 -0.172 0.124 -0.016 0.097 0.010 -0.136 -0.011 0.384 0.011 9.08 9.0822 22 5.39 5.3935 35 4.18 4.1812 12 3.12 3.1283 83 1.05 1.0515 15 33.6 53.6 69.1 80.7 84.6 In Table 5, for each row, the maximum absolute value is found and the value is shown bold and thick border. For each factor (column) we have made a clustering of genres. After this clustering we obtain the clusters in Table 6. Comparing clusters obtained by hierarchical clustering method and clusters obtained by principal component factor analysis, only 3 out of 27 genres, shown in bold in Table 6, are placed into different clusters. Using PCFA, Animation and Western are placed into cluster 1 Cluster 4 Sport, Sport, (Factor 4) Show Cluster 5 Adult (Factor 5) Reality-TV, Talk-Show, Game- 5 CONCLUSION Movie directors combine elements from different genres into a single movie plot. Hence, a movie may belong to several genres. In this study, we have used movie data from The Internet Movie Database. We have chosen movies with English language titles between years 2006 and 2010, a five year period, which makes a total of 48483 titles, with 27 genres and 19561 keywords. We have established genre keyword sets from movie keywords and used them to analyze the proximity of genres with each other. We have classified genres into five clusters and discovered the closest genre pairs and triples. We have compared the results obtained hierarchical clustering method and principal component factor analysis. Results of the two analyses are close to each other: classification of 24 out of 27 genres (88.9%) match with each other. REFERENCES [1] [2] [3] [4] Z. Rasheed and M. Shah, "Movie genre classification by exploiting audio-visual features of previews", Proc. the 16th International Conference on Pattern Recognition vol.2, no., pp. 1086- 1089 vol.2, 2002. Z. Rasheed, Y. Sheikh, and M. Shah, "On the use of computable features for film classification," IEEE Transactions on Circuits and Systems for Video Technology, vol.15, no.1, pp. 52- 64, Jan. 2005 H. Zhou, T. Hermans, A. V. Karandikar, J. M. Rehg, "Movie Genre Classification via Scene Categorization", Proc. 10th international conference on Multimedia, pp. 747-750, 2010. A. Austin, E. Moore, U. Gupta, and P. Chordia, "Characterization of movie genre based on music score," IEEE International Conference on Acoustics Speech and Signal Processing, Processing, pp.421-424, 2010 JOURNAL OF COMPUTING, VOLUME 3, I SSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG [5] B. W. Herr, K. Weimao, E. Hardy, and Borner, "Movies and Actors: Mapping the Internet Movie Database," the 11th International Conference on Information Visualization, pp.465-469, 2007 [6] A. Ahmed, V. Batagelj, X Fu, S. -H. Hong, D. Merrick, and A. Mrvar, "Visualisation and analysis of the internet movie database," the 6th International Asia-Pacific Symposium on Visualization, pp.17-24, 2007 [7] M. Saraee, S. White, and J. Eccleston, “A Data Mining Approach to Analysis and Prediction of Movie Ratings”, the 5th International Conference On Data Mining, pp. 343-352, 2004 [8] W. Zhang, and S. Skiena, "Improving Movie Gross Prediction through News Analysis," IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, pp.301-304, 2009 [9] S.-T. Park, and D. M. Pennock, “Applying c ollaborative filtering techniques to movie search for better ranking and browsing”, Proc. the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.550-559, 2007. [10] Y. Liu, X. Huang, A. An, and X. Yu, "Modeling and Predicting the Helpfulness of Online Reviews," the 8th IEEE International Conference on Data Mining, Mining, pp.443-452, 2008. [11] A. Christmann, and S. Van Aelst, "Robust estimation of Cronbach's alpha," Journal of Multivariate Analysis, Analysis, vol. 97, pp. 1660-1674, 2006. Hasan Bulut is a member of the IEEE and the IEEE Computer Society. He is an Asst. Prof. of Computer Engineering Dept. at Ege University, Izmir, Turkey. He received his B.S. degree in Electronics and Telecommunications Engineering in 1996 from Istanbul Technical University, Istanbul, Turkey, M.Sc. in Computer Science in 2000 from Syracuse University, Syracuse, NY, USA, and Ph.D. in Computer Science in 2007 from Indiana University, Bloomington, IN, USA. Serdar Korukoglu is a full-time professor of Computer Engineering Dept. at Ege University, Izmir, Turkey. He received his B.S. degree in Industrial Engineering, M.Sc. in Applied Statistics and Ph.D. in Computer Engineering from Ege University, Izmir, Turkey. He was in Reading University of England as a visiting research fellow in 1985. 23 </div> </div> </div> </div> </div> </div> </div> </div> <div class="modal fade" id="report" tabindex="-1" role="dialog" aria-hidden="true"> <div class="modal-dialog"> <div class="modal-content"> <form role="form" method="post" action="https://idoc.tips/report/analysis-and-clustering-of-movie-genres-pdf-free" style="border: none;"> <div class="modal-header"> <button type="button" class="close" data-dismiss="modal" aria-hidden="true">×</button> <h4 class="modal-title">Report "Analysis and Clustering of Movie Genres"</h4> </div> <div class="modal-body"> <div class="form-group"> <label>Your name</label> <input type="text" name="name" required="required" class="form-control" /> </div> <div class="form-group"> <label>Email</label> <input type="email" name="email" required="required" class="form-control" /> </div> <div class="form-group"> <label>Reason</label> <select name="reason" required="required" class="form-control"> <option value="">-Select Reason-</option> <option value="pornographic" selected="selected">Pornographic</option> <option value="defamatory">Defamatory</option> <option value="illegal">Illegal/Unlawful</option> <option value="spam">Spam</option> <option value="others">Other Terms Of Service Violation</option> <option value="copyright">File a copyright complaint</option> </select> </div> <div class="form-group"> <label>Description</label> <textarea name="description" required="required" rows="3" class="form-control" style="border: 1px solid #cccccc;"></textarea> </div> <div class="form-group"> <div style="display: inline-block;"> <div class="g-recaptcha" data-sitekey="6LcHT8sZAAAAAPKfs_PZGhwvz-OHbUMuekQzz5xK"></div> </div> </div> <script src='https://www.google.com/recaptcha/api.js'></script> </div> <div class="modal-footer"> <button type="button" class="btn btn-default" data-dismiss="modal">Close</button> <button type="submit" class="btn btn-success">Send</button> </div> </form> </div> </div> </div> <script> $(document).ready(function () { var inner_height = $(window).innerHeight() - 250; $('#pdfviewer').css({"height": inner_height + "px"}); }); </script> <footer class="footer" style="margin-top: 60px;"> <div class="container-fluid"> Copyright © 2024 IDOC.TIPS. All rights reserved. <div class="pull-right"> <a href="https://idoc.tips/about">About Us</a> | <a href="https://idoc.tips/privacy">Privacy Policy</a> | <a href="https://idoc.tips/term">Terms of Service</a> | <a href="https://idoc.tips/copyright">Copyright</a> | <a href="https://idoc.tips/contact">Contact Us</a> | <a href="https://idoc.tips/cookie_policy">Cookie Policy</a> </div> </div> </footer>  <div class="modal fade" id="login" tabindex="-1" role="dialog" aria-labelledby="myModalLabel"> <div class="modal-dialog" role="document"> <div class="modal-content"> <div class="modal-header"> <button type="button" class="close" data-dismiss="modal" aria-label="Close" on="tap:login.close">×</button> <h4 class="modal-title" id="add-note-label">Sign In</h4> </div> <div class="modal-body"> <form action="https://idoc.tips/login" method="post"> <div class="form-group"> <label class="sr-only" for="email">Email</label> <input class="form-input form-control" type="text" name="email" id="email" value="" placeholder="Email" /> </div> <div class="form-group"> <label class="sr-only" for="password">Password</label> <input class="form-input form-control" type="password" name="password" id="password" value="" placeholder="Password" /> </div> <div class="form-group"> <div class="checkbox"> <label class="form-checkbox"> <input type="checkbox" name="remember" value="1" /> Remember me </label> <label class="pull-right"><a href="https://idoc.tips/forgot">Forgot password?</a></label> </div> </div> <button class="btn btn-primary btn-block" type="submit">Sign In</button> </form> </div> </div> </div> </div>  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-177830117-1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'UA-177830117-1'); </script> <script src="https://idoc.tips/assets/js/jquery-ui.min.js"></script> <link rel="stylesheet" href="https://idoc.tips/assets/css/jquery-ui.css"> <script> $(function () { $("#document_search").autocomplete({ source: function (request, response) { $.ajax({ url: "https://idoc.tips/suggest", dataType: "json", data: { term: request.term }, success: function (data) { response(data); } }); }, autoFill: true, select: function (event, ui) { $(this).val(ui.item.value); $(this).parents("form").submit(); } }); }); </script>  <div id="IDOCTIPS_cookie_box" style="z-index:99999; background: #97c479; width: 100%; position: fixed; padding: 5px 15px; text-align: center; left:0; bottom: 0;"> Our partners will collect data and use cookies for ad personalization and measurement. <a href="https://idoc.tips/cookie_policy" target="_blank">Learn how we and our ad partner Google, collect and use data</a>. <a href="#" class="btn btn-success" onclick="accept_IDOCTIPS_cookie_box();return false;">Agree & close</a> </div> <script> function accept_IDOCTIPS_cookie_box() { document.cookie = "IDOCTIPS_cookie_box_viewed=1;max-age=15768000;path=/"; hide_IDOCTIPS_cookie_box(); } function hide_IDOCTIPS_cookie_box() { var cb = document.getElementById('IDOCTIPS_cookie_box'); if (cb) { cb.parentElement.removeChild(cb); } } (function () { var IDOCTIPS_cookie_box_viewed = (function (name) { var matches = document.cookie.match(new RegExp("(?:^|; )" + name.replace(/([\.$?*|{}\[\]\\\/\+^])/g, '\\$1') + "=([^;]*)")); return matches ? decodeURIComponent(matches[1]) : undefined; })('IDOCTIPS_cookie_box_viewed'); if (IDOCTIPS_cookie_box_viewed) { hide_IDOCTIPS_cookie_box(); } })(); </script>  </body> </html>

Analysis and Clustering of Movie Genres

Recommend Documents