{"id":342,"date":"2025-10-13T22:46:32","date_gmt":"2025-10-13T20:46:32","guid":{"rendered":"https:\/\/mlatilikzsolt.hu\/?p=342"},"modified":"2025-10-13T22:46:32","modified_gmt":"2025-10-13T20:46:32","slug":"intro-to-neural-networks_part5","status":"publish","type":"post","link":"https:\/\/mlatilikzsolt.hu\/en\/2025\/10\/13\/intro-to-neural-networks_part5\/","title":{"rendered":"Introduction to the World of Neural Networks Part 5"},"content":{"rendered":"<p>In the previous parts, we built a layer that computes the raw output of neurons \u2013 the weighted sum plus bias. However, that output is linear: if you double the input, the output doubles. A linear network can only learn straight-line relationships.<\/p>\n\n\n\n<p>But the real world is nonlinear. Image recognition, speech understanding, and natural language processing all involve complex patterns that linear models cannot capture. This is where activation functions come in. These nonlinear transformations give neural networks the ability to sense, adapt, and truly learn.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"0-mi-az-aktiv%C3%A1ci%C3%B3s-f%C3%BCggv%C3%A9ny-szerepe\">The Role of Activation Functions<\/h2>\n\n\n\n<p>Without activation:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>output = sum (inputs * weights) + bias<\/pre><\/div>\n\n\n\n<p>With activation:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>output = f(sum (inputs * weights) + bias)<\/pre><\/div>\n\n\n\n<p>The function f() is the activation function, and it\u2019s what gives the network its learning power.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Common Activation Functions<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Step Function<\/h3>\n\n\n\n<p>The simplest of all activation functions is the Step. It works like a switch: if the neuron\u2019s input is above a threshold, the output is 1; otherwise, it\u2019s 0.<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>f(x) = \\begin{cases} 1 &amp; \\text{ } x \\geq 0 \\\\ 0 &amp; \\text{ } x &lt; 0 \\end{cases}<\/pre><\/div>\n\n\n\n<p>Graphically depicted:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"662\" src=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/step-1-1024x662.png\" alt=\"\" class=\"wp-image-353\" srcset=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/step-1-1024x662.png 1024w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/step-1-300x194.png 300w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/step-1-768x496.png 768w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/step-1-1536x992.png 1536w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/step-1-2048x1323.png 2048w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/step-1-18x12.png 18w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>It\u2019s a good way to illustrate how neurons turn on or off, but it\u2019s not suitable for training, since it\u2019s not continuous and doesn\u2019t support gradient-based learning. Step is mostly used for demonstration purposes, as we did earlier.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Sigmoid Function<\/h3>\n\n\n\n<p>The Sigmoid is smoother. It squashes every input into the range 0\u20131 following a soft S-shaped curve:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>f(x) = \\frac {1} {(1 + e^{-x})}<\/pre><\/div>\n\n\n\n<p>Graphically depicted:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"662\" src=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/sigmoid-1-1024x662.png\" alt=\"\" class=\"wp-image-354\" srcset=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/sigmoid-1-1024x662.png 1024w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/sigmoid-1-300x194.png 300w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/sigmoid-1-768x496.png 768w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/sigmoid-1-1536x992.png 1536w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/sigmoid-1-2048x1323.png 2048w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/sigmoid-1-18x12.png 18w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>This makes it ideal when we want an output that represents probability, such as in binary classification tasks. However, the Sigmoid\u2019s gradient becomes extremely small for very large or small input values, slowing down learning \u2014 a problem known as the vanishing gradient.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Tanh Function<\/h3>\n\n\n\n<p>The tanh function, short for hyperbolic tangent, is similar to Sigmoid but scales outputs between -1 and 1:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>f(x) = tanh(x)<\/pre><\/div>\n\n\n\n<p>Graphically depicted:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"662\" src=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/tanh-1-1024x662.png\" alt=\"\" class=\"wp-image-355\" srcset=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/tanh-1-1024x662.png 1024w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/tanh-1-300x194.png 300w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/tanh-1-768x496.png 768w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/tanh-1-1536x992.png 1536w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/tanh-1-2048x1323.png 2048w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/tanh-1-18x12.png 18w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Because its output is centered around zero, it often trains faster and more stably. Still, it suffers from the same vanishing gradient issue in extreme regions. Despite that, it remains popular in smaller or older networks for its intuitive and balanced behavior.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ReLU (Rectified Linear Unit)<\/h3>\n\n\n\n<p>The ReLU is perhaps the most widely used activation function in modern networks. It\u2019s defined as:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>f(x) = max(0, x)<\/pre><\/div>\n\n\n\n<p>Graphically depicted:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"662\" src=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/relu-1024x662.png\" alt=\"\" class=\"wp-image-356\" srcset=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/relu-1024x662.png 1024w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/relu-300x194.png 300w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/relu-768x496.png 768w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/relu-1536x992.png 1536w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/relu-2048x1323.png 2048w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/relu-18x12.png 18w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Negative inputs become 0, positive inputs pass through unchanged. Its simplicity is its strength \u2014 it\u2019s fast, efficient, and avoids the Sigmoid\u2019s gradient issues. However, some neurons can \u201cdie\u201d if they get stuck with negative inputs forever, never activating again. Even so, ReLU remains the default choice for most deep learning models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Leaky ReLU<\/h3>\n\n\n\n<p>The Leaky ReLU improves on ReLU by allowing a small, nonzero output for negative inputs:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>f(x) = \\begin{cases} x &amp; \\text{ } x \\geq 0 \\\\ 0.01*x &amp; \\text{ }x &lt; 0 \\end{cases}<\/pre><\/div>\n\n\n\n<p>Graphically depicted:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"662\" src=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/leaky_relu-1024x662.png\" alt=\"\" class=\"wp-image-357\" srcset=\"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/leaky_relu-1024x662.png 1024w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/leaky_relu-300x194.png 300w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/leaky_relu-768x496.png 768w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/leaky_relu-1536x992.png 1536w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/leaky_relu-2048x1323.png 2048w, https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/leaky_relu-18x12.png 18w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>This tiny \u201cleak\u201d keeps neurons alive even when their inputs are mostly negative, leading to more stable training. It\u2019s often used when too many neurons become inactive under standard ReLU.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<p>Activation functions give neural networks their nonlinear power. Without them, a model could only describe linear relationships \u2014 essentially a flat plane or line. With them, neural networks can learn complex, nonlinear decision boundaries and perform genuinely intelligent behavior.<\/p>\n\n\n\n<p>In the next article, we\u2019ll see how to implement these activation functions in Python and NumPy, and observe how they transform a layer\u2019s output in practice.<\/p>","protected":false},"excerpt":{"rendered":"<p>In the previous parts, we built a layer that computes the raw output of neurons \u2013 the weighted sum plus bias. However, that output is linear: if you double the input, the output doubles. A linear network can only learn straight-line relationships.<\/p>","protected":false},"author":1,"featured_media":360,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":1,"footnotes":""},"categories":[9,8],"tags":[11,10],"class_list":["post-342","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial_intelligence","category-neural-networks","tag-artificial-intelligence","tag-neural-networks"],"featured_image_src":"https:\/\/mlatilikzsolt.hu\/wp-content\/uploads\/2025\/10\/functions.jpg","author_info":{"display_name":"MlatilikZsolt","author_link":"https:\/\/mlatilikzsolt.hu\/en\/author\/mlatilikzsolt\/"},"_links":{"self":[{"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/posts\/342","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/comments?post=342"}],"version-history":[{"count":7,"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/posts\/342\/revisions"}],"predecessor-version":[{"id":359,"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/posts\/342\/revisions\/359"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/media\/360"}],"wp:attachment":[{"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/media?parent=342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/categories?post=342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mlatilikzsolt.hu\/en\/wp-json\/wp\/v2\/tags?post=342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}