There was a time when popularity was considered an important asset to become a legislator in DPR, hence many artistes chose to leave their career for politics. That is not the case now. Some well-known names such as Sandy Nayoan (actor) and former world badminton champion, Ricky Subagja, were not able to become a legislator. To make things worse, even artistes who were qualified for parliament in the previous period were eliminated from the competition (Ingrid Kansil and Nurul Arifin). Well-known politicians such as Marzuki Alie, Priyo Budi Santoso, Eva Kusuma Sundari, and Sutan Bhatoegana also failed in the elections. This proves that the constituent's preferences are getting more and more diverse, and parties should build more scientific and appropriate strategies to achieve their competitive advantage to compete in the general election.
One of the key strategies is to determine which regions suits one particular candidate by data mining the pattern of previous legislator's personal information. Data mining is defined as a process of pattern discovery in a set of data  known to be able to predict an outcome of a certain dataset. Previous studies of data mining methods have been used to determine motorcycle ownership credit risk  and credit application approval , to classify job opportunities of fresh graduates , to determine loan approval for cooperation , to predict university graduates , and to predict loan customer category .
In this research, we use KNN algorithm to match reference data that were stored in designated clusters with a new data to predict its outcome. This algorithm has also been widely used in previous studies including data classification results of palm oil production of PT. Minamas Kecamatan Parindu , hotspot classification on peat lands in Sumatera and Kalimantan , determination of scholarship's recipients , identification of batik's pattern , news text classification , and even on cardiac disease diagnosis . Data used as reference are national house of representative (DPR) legislative member of 2009-2014, 2014-2019. Regions or Districts are specified to 11 constituencies of West Java.
This paper aims to determine whether or not such algorithm can be used to determine the possibility of one particular candidate to be clustered into one particular district or constituency. In the context of this study, the research questions are:
RQ1: How effective is the KNN-Algorithmbeing used to cluster and predict legislative candidates for their respective constituencies especially in West Java District?
RQ2: Does the algorithm accurately predict the election of particular legislators compared with previous general election data (2014)?
We expect that the use of algorithm and eventually computer based information systems of candidate registration and selection can be tested and implemented soon to political parties in order to enhance their selection mechanism of legislative candidates, and eventually increase their level of success to enter into the House of Representative.
2. Literature Review
2.1. Data Mining
Data mining is used to uncover key information in the database. Data mining is also defined as a process of discovering trends in a dataset, or as a series of processes for adding value in the form of knowledge that were unknown from a dataset . Data mining is a part of the knowledge discovery process in databases (KDD) in charge of extracting patterns or models of data by using a specific algorithm. The process of KDD is as follows:
• Data selection: selection of data from the operational dataset needs to be done before the stage of extracting information in KDD.
• Cleansing: removing data duplication, check data for inconsistencies, and correct errors in data such as printing errors.
• Transformation: transform data into a one generic format for the data mining process.
• Data mining: search for patterns or interesting information in the selected data by using techniques or methods (in this case, we will use K-NN Algorithm).
• Evaluation: The information pattern is derived from data mining process needs to be presented in a form easily understood by stakeholders.
2.2. K-Nearest Neighbors
K-NN algorithms belong to the supervised algorithms obtained through a learning process (learning) upon reference data that has been classified, and learning outcomes are used to classify new data with unknown output. In the K-NN algorithm, a new data is classified based on the distance of the new data with new data similarity level closest to the data pattern . There are many similarities in the K-Nearest Neighbors algorithm to determine the distance between the proximity of old data with new data, one of which is the Euclidean distance. It is formulated in the following equation:
x 1 = reference data
x 2 = test data
i = Individual attributes between 1 to 14
d = distance
3. Method - Data Collection
We gathered the data of legislators who entered the parliament in 2004 and 2009. The data were extracted from online sources such as political parties' websites, Wikipedia, and also reference books on DPR of Indonesia. There were 91 people in 2009 and 91 people in 2014 who became legislators leading to 182 data references. However, since there were incumbents in 2009 and some data were not showing detailed information and therefore could not be treated as references, we decided to only use 142 of them.
4. Requirement Analysis
This stage involved interviews with two political parties (NASDEM and GOLKAR of West Java) to derive information on how they chose their candidates for the legislative assembly. Through a series of interviews, we concluded that there were 3 main processes: registration – selection – classification. Firstly, the candidates registered themselves at the office. Then, a series of events was conducted to select which candidate suited one party's criteria. Finally, the candidate would be classified according to the list of constituencies available in West Java.
4.1. System Design
The suggested system is depicted in Figure 1.
Before using the equation 1, we first clustered data reference based on 11 constituencies/regions of elections in West Java. After the group was formed, the next step was to find the center of the clusters, which then were counted the proximity to equation 1.
4.2. Software Design and Implementation
In this stage, we conducted the design and then implemented the calculation in a software system and produced an output that determined the successful prediction, and then chose whichregion/constituency one particular candidate was most likely to win in the legislative elections of West Java.
4.3. Testing and Evaluation
We tested the accuracy of the algorithm by entering data from previous election to see whether or not the result was the same as on the ground. The findings would be presented in the conclusion section of this paper.
5. Findings and Discussion
We set 14 inputs to the system: (1) place of birth, (2) region of association, (3) place of education, (4) domicile city, (5) career city, (6) incumbent (legislators are still in office and then again ran in the general election), (7) total year as a legislature, (8) highest rank in the office, (9) highest/latest education, (10) public figures in the arts, (11) total years of organizational experience, (12) total career, (13) number of previous organization, and (14) number of career.
To visualize the input process, we tested it with one dataset under the name of Handoko Suriaatmaja, in accordance to the master fields above: (1) Kota Bandung, (2) Kota Bandung, (3) Jakarta, (4) Jakarta, (5) Kota Bandung, (6) No, (7) None, (8) Board/Chairman, (9) Undergraduate, (10) No, (11) 4, (12) 5, (13) 7, (14) 2.
Next, we compared the test data with pre-defined clusters, which are set in accordance to the 11 constituencies of West Java. Using K-Nearest Neighbors algorithm, we then calculated the Euclidian distance of our test data compare to the 11 clusters, which was pre-defined earlier. The output of this system in the form of recommendations for potential electoral candidates was based on the value of the smallest distance to the center of clusters. The calculation can be shown in Table 1:
The smallest distance found in Clusters 1 with a value of 10.25, making recommendations for the electoral district Handoko Suriaatmaja is constituency of West Java I. This action can be performed repeatedly until the total slots on each regions/constituency are filled in accordingly.
The implementation of this research has resulted in software which can be seen in a set of user interfaces in Figure 2.
Our research has proven to be able to give output and recommendations, and even shown the relative accuracy along the way. However, there is still a much work in to be done, in which some limitations occur such as the absence of interactive communication between candidates and parties, while all data entries are being handled by an administrator.
This research has resulted in a system that can determine the success of potential legislative candidates in the electoral district of West Java using K-NN Algorithm. We also performed accuracy test on 12 different previous legislators, and resulted in 10 correct and 2 incorrect results; hence we determine that the accuracy of this system is 83.33%.
We believe our analysis can bring much useful information to political parties participating in the general election and also to the future candidates for legislative office. However, our study is still a work in progress in many areas; hence further tests and developments need to be carried out. One suggestion is to add not only the successful legislators but also the failed ones so it can bring more data variances in each constituency.