{"id":500,"date":"2016-10-28T14:59:17","date_gmt":"2016-10-28T07:59:17","guid":{"rendered":"http:\/\/arikuncoro.xyz\/wordpress\/?p=95"},"modified":"2021-06-10T10:50:29","modified_gmt":"2021-06-10T03:50:29","slug":"the-comparison-between-randomforest-and-ranger","status":"publish","type":"post","link":"https:\/\/arikuncoro.xyz\/blog\/data-science\/r-python-sql-linux\/the-comparison-between-randomforest-and-ranger\/","title":{"rendered":"The comparison between randomForest and ranger"},"content":{"rendered":"\n<p>A couple days ago I had a chance to\u00a0be a speaker on\u00a0internal data scientist meeting at the company that I work for: <a href=\"http:\/\/www.streamintelligence.com\">Stream Intelligence<\/a>. The meeting is usually held on\u00a0monthly basis, and the\u00a0last meeting in October was 6th meeting. We used Skype for Business to connect between the Data Scientists in\u00a0Jakarta and in\u00a0London.<\/p>\n\n\n\n<p>I delivered a topic titled&nbsp;<strong>Random forest in R: A case study of a telecommunication company<\/strong>. For those who do not know Random Forest,&nbsp;an Indian guy, Gopal Malakar, had made a&nbsp;video uploaded in Youtube. He elaborated the definition of random forest. First of all, check the video out!<\/p>\n\n\n\n<figure><iframe loading=\"lazy\" width=\"560\" height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/LIPtRVDmj1M\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/figure>\n\n\n\n<p>Based on the video,&nbsp;one important thing that you&nbsp;have to remember about random forest is that, it is a<strong> collection of trees<\/strong>.&nbsp;It was built by a number of decision trees. Each decision trees is&nbsp;formed by&nbsp;random variables and observations of the training data.<\/p>\n\n\n\n<p>Supposed that we have trained&nbsp;a random forest model, and it was made from&nbsp;100 decision trees. One test observation was inputted on the model. The decision tree outputs will result&nbsp;60Y and 40N. Hence the output of random forest model is Y with score or probability 0.6.<\/p>\n\n\n\n<p>OK, let&#8217;s practice how to train random forest algorithm for classification in R. I just knew it couple weeks ago from Datacamp course, that there are two random forest packages: 1) randomForest and 2) ranger. They recommend ranger, because it is a lot faster than original randomForest.<\/p>\n\n\n\n<p>To prove it, I have created a script using Sonar dataset and caret package for machine learning,&nbsp;with methods: ranger \/ rf, and tuneLength=2 (this argument refers to mtry, or number of variables that was used to create trees in random forest). In random Forest, mtry is the hyperparameter that we can tune.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted lang:r decode:true\"># Load some data \nlibrary(caret)\nlibrary(mlbench)\ndata(Sonar)\n\n# Fit a model with a deeper tuning grid \nptm &lt;- proc.time()\nmodel &lt;- train(Class~., data = Sonar, method=\"ranger\",tuneLength=2)\nproc.time() - ptm\n\nptm2 &lt;- proc.time()\nmodel_rf &lt;- train(Class~., data = Sonar, method=\"rf\",tuneLength=2)\nproc.time() - ptm2<\/pre>\n\n\n\n<p>Output of ranger training<\/p>\n\n\n\n<pre class=\"wp-block-preformatted lang:r decode:true\">&gt; proc.time() - ptm\nuser system elapsed\n22.37 0.29 23.66<\/pre>\n\n\n\n<p>Output of random forest training<\/p>\n\n\n\n<pre class=\"wp-block-preformatted lang:r decode:true\">&gt; proc.time() - ptm2\nuser system elapsed\n26.75 0.29 27.80<\/pre>\n\n\n\n<p>So, the random forest training with ranger function is 26.75-22.37 = 4.38 seconds or 25% faster than original random forest (Assume we use user time).<\/p>\n\n\n\n<p>However, if I tried to change tuneLength parameter with 5. It reveals that the original randomForest&nbsp;function is faster&nbsp;than ranger. Hmmm&#8230; seems that I have to upload a question to stackoverflow or Datacamp experts.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted lang:r decode:true\">&gt; library(mlbench)\n&gt; data(Sonar)\n&gt; \n&gt; # Fit a model with a deeper tuning grid \n&gt; ptm &lt;- proc.time()\n&gt; model &lt;- train(Class~., data = Sonar, method=\"ranger\",tuneLength=5)\n&gt; proc.time() - ptm\n   user  system elapsed \n 137.19    0.69  141.67 \n&gt; \n&gt; ptm2 &lt;- proc.time()\n&gt; model_rf &lt;- train(Class~., data = Sonar, method=\"rf\",tuneLength=5)\n&gt; proc.time() - ptm2\n   user  system elapsed \n  79.30    0.10   81.55<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A couple days ago I had a chance to\u00a0be a speaker on\u00a0internal data scientist meeting at the company that I &#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85,118,281],"tags":[86,88,91,99,104,105,106,113],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/posts\/500"}],"collection":[{"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/comments?post=500"}],"version-history":[{"count":3,"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/posts\/500\/revisions"}],"predecessor-version":[{"id":2118,"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/posts\/500\/revisions\/2118"}],"wp:attachment":[{"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/media?parent=500"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/categories?post=500"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/arikuncoro.xyz\/blog\/wp-json\/wp\/v2\/tags?post=500"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}