While linear function approximate the right hand dataset quite well, the perform miserably for the one at the left. Since non linearly distributed datasets dominate the realm of machine learning, and activation functions are the only suitable spot to inject nonlinearity in to the network, there is no scope for the the function to be linear. Some of the renowned functions to address this problem are,

**Sigmoid function: **This is a function that takes in a number and outputs a number in the range of (0, 1). The smaller the input, closer is the output to zero, greater it is, closer it approaches to 1, without ever touching either of the extremeties.