{"id":3418,"date":"2024-09-01T13:51:08","date_gmt":"2024-09-01T13:51:08","guid":{"rendered":"https:\/\/workhouse.sweetdishy.com\/?p=3418"},"modified":"2024-09-01T13:51:09","modified_gmt":"2024-09-01T13:51:09","slug":"applying-algorithms","status":"publish","type":"post","link":"https:\/\/workhouse.sweetdishy.com\/index.php\/2024\/09\/01\/applying-algorithms\/","title":{"rendered":"Applying Algorithms"},"content":{"rendered":"\n<p id=\"Par104\">Some algorithms are quite easy to calculate, while others require complex steps and mathematics. The good news is that you usually do not have to compute an algorithm because there are a variety of languages like Python and R that make the process straightforward.<\/p>\n\n\n\n<p id=\"Par105\">As for&nbsp;machine learning, an algorithm is typically different from a traditional one. The reason is that the first step is to process data\u2014and then, the computer will start to learn.<\/p>\n\n\n\n<p id=\"Par106\">Even though there are hundreds of machine learning algorithms available, they can actually be divided into four major categories:&nbsp;supervised learning,&nbsp;unsupervised learning,&nbsp;reinforcement learning, and semi-supervised learning. We\u2019ll take a look at each.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Supervised Learning<\/h3>\n\n\n\n<p id=\"Par107\">Supervised learning uses labeled data. For example, suppose we have a set of photos of thousands of dogs. The data is considered to be labeled if each photo identifies each for the breed. For the most part, this makes it easier to analyze since we can compare our results with the correct answer.<\/p>\n\n\n\n<p id=\"Par108\">One of the keys with supervised learning is that there should be large amounts of data. This helps to refine the model and produce more accurate results.<\/p>\n\n\n\n<p id=\"Par109\">But there is a big issue: The reality is that much of the data available is not labeled. In addition, it could be time consuming to provide labels if there is a massive dataset.<\/p>\n\n\n\n<p id=\"Par110\">Yet there are creative ways to deal with this, such as with crowdfunding. This is how the&nbsp;ImageNet&nbsp;system&nbsp;was built, which was a breakthrough in AI innovation. But it still took several years to create it.<\/p>\n\n\n\n<p id=\"Par111\">Or, in some cases, there can be automated approaches to label data. Take the example of\u00a0Facebook. In 2018, the company announced\u2014at its F8 developers conference\u2014it leveraged its enormous database of photos from Instagram, which were labeled with hashtags.<sup>12<\/sup><\/p>\n\n\n\n<p id=\"Par113\">Granted, this approach had its flaws. A hashtag may give a nonvisual&nbsp;description&nbsp;of the photo\u2014say #tbt (which is \u201cthrowback Thursday)\u2014or could be too vague, like #party. This is why&nbsp;Facebook&nbsp;called its approach \u201cweakly supervised data.\u201d But the talented engineers at the company found some ways to improve the quality, such as by building a sophisticated hashtag prediction model.<\/p>\n\n\n\n<p id=\"Par114\">All in all, things worked out quite well.&nbsp;Facebook\u2019s&nbsp;machine learning&nbsp;model&nbsp;, which included 3.5 billion photos, had an accuracy rate of 85.4%, which was based on the&nbsp;ImageNet&nbsp;recognition benchmark. It was actually the highest recorded in history, by 2%.<\/p>\n\n\n\n<p>This AI project also required innovative approaches for building the infrastructure. According to the&nbsp;Facebook&nbsp;blog post:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p id=\"Par116\">Since a single machine would have taken more than a year to complete the model training, we created a way to distribute the task across up to 336 GPUs, shortening the total training time to just a few weeks. With ever-larger model sizes\u2014the biggest in this&nbsp;research&nbsp;is a ResNeXt 101-32x48d with over 861 million parameters\u2014such distributed training is increasingly essential. In addition, we designed a method for removing duplicates to ensure we don\u2019t accidentally train our models on images that we want to evaluate them on, a problem that plagues similar research in this area.<sup><a href=\"https:\/\/learning.oreilly.com\/library\/view\/artificial-intelligence-basics\/9781484250280\/html\/480660_1_En_3_Chapter.xhtml#Fn13\">13<\/a><\/sup><\/p>\n<\/blockquote>\n\n\n\n<p>Going forward,&nbsp;Facebook&nbsp;sees potential in using its approach to various areas, including the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Improved ranking in the newsfeed<\/li>\n\n\n\n<li>Better detection of objectionable content<\/li>\n\n\n\n<li>Auto&nbsp;generation&nbsp;of captions for the visually impaired<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Unsupervised Learning<\/h3>\n\n\n\n<p id=\"Par122\">Unsupervised learning is when you are working with unlabeled data. This means that you will use deep learning algorithms to detect patterns.<\/p>\n\n\n\n<p>By far, the most common approach for unsupervised learning is&nbsp;clustering, which takes unlabeled data and uses algorithms to put similar items into groups. The process usually starts with guesses, and then there are iterations of the calculations to get better results. At the heart of this is finding data items that are close together, which can be accomplished with a variety of quantitative&nbsp;methods:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Euclidean Metric<\/em>: This is a straight line between two data points. The Euclidean metric is quite common with&nbsp;machine learning.<\/li>\n\n\n\n<li><em>Cosine Similarity Metric<\/em>: As the name implies, you will use a cosine to measure the angle. The idea is to find similarities between two data points in terms of the orientation.<\/li>\n\n\n\n<li><em>Manhattan Metric<\/em>: This involves taking the sum of the absolute distances of two points on the coordinates of a graph. It\u2019s called the \u201cManhattan\u201d because it references the city\u2019s street layout, which allows for shorter distances for&nbsp;travel.<\/li>\n<\/ul>\n\n\n\n<p id=\"Par127\">In terms of use cases for&nbsp;clustering, one of the most common is customer segmentation, which is to help better target marketing messages. For the most part, a group that has similar characteristics is likely to share interests and preferences.<\/p>\n\n\n\n<p id=\"Par128\">Another application is&nbsp;sentiment analysis, which is where you mine social media data and find the trends. For a fashion&nbsp;company, this can be crucial in understanding how to adapt the styles of the upcoming line of clothes.<\/p>\n\n\n\n<p>Now there are other approaches than just&nbsp;clustering. Here\u2019s a look at three more:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Association<\/em>: The basic&nbsp;concept&nbsp;is that if X happens, then Y is likely to happen. Thus, if you buy my book on AI, you will probably want to buy other titles in the genre. With association, a deep learning algorithm can decipher these kinds of relationships. This can result in powerful recommendation engines.<\/li>\n\n\n\n<li><em>Anomaly Detection<\/em>: This identifies outliers or anomalous patterns in the dataset, which can be helpful with cybersecurity applications. According to Asaf Cidon, who is the VP of Email\u00a0Security\u00a0at Barracuda Networks: \u201cWe\u2019ve found that by combining many different signals\u2014such as the email body, header, the social graph of communications, IP logins, inbox forwarding rules, etc.\u2014we\u2019re able to achieve an extremely high precision in detecting social engineering attacks, even though the attacks are highly personalized and crafted to target a particular person within a particular organization.\u00a0Machine learning\u00a0enables us to detect attacks that originate from within the organization, whose source is a legitimate mailbox of an employee, which would be impossible to do with a static one-size-fits-all rule engine.\u201d<sup>14<\/sup><\/li>\n\n\n\n<li><em>Autoencoders<\/em>: With this, the data will be put into a compressed form, and then it will be reconstructed. From this, new patterns may emerge. However, the use of autoencoders is rare. But it could be shown to be useful in helping with applications like reducing noise in&nbsp;data.<\/li>\n<\/ul>\n\n\n\n<p id=\"Par134\">Consider that many AI researchers believe that unsupervised learning will likely be critical for the next level of achievements. According to a paper in&nbsp;<em>Nature<\/em>&nbsp;by Yann LeCun, Geoffrey Hinton, and Yoshua Bengio, \u201cWe expect unsupervised learning to become far more important in the longer term. Human and animal learning is largely unsupervised: we discover the structure of the world by observing it, not by being told the name of every object.\u201d<sup><a href=\"https:\/\/learning.oreilly.com\/library\/view\/artificial-intelligence-basics\/9781484250280\/html\/480660_1_En_3_Chapter.xhtml#Fn15\">15<\/a><\/sup><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Reinforcement Learning<\/h3>\n\n\n\n<p id=\"Par136\">When you were a kid and wanted to play a new sport, chances were you did not read a manual. Instead, you observed what other people were doing and tried to figure things out. In some situations, you made mistakes and lost the ball as your teammates would show their displeasure. But in other cases, you made the right moves and scored. Through this trial-and-error process, your learning was improved based on positive and negative reinforcement.<\/p>\n\n\n\n<p>At a high level, this is analogous to reinforcement learning. It has been key for some of the most notable achievements in AI, such as the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Games<\/em>: They are ideal for reinforcement learning since there are clear-cut rules, scores, and various constraints (like a game board). When building a model, you can test it with millions of&nbsp;simulations, which means that the system will quickly get smarter and smarter. This is how a program can learn to beat the world champion of Go or chess.<\/li>\n\n\n\n<li><em>Robotics<\/em>: A key is being able to navigate within a space\u2014and this requires evaluating the environment at many different points. If the robot wants to move to, say, the kitchen, it will need to navigate around furniture and other obstacles. If it runs into things, there will be a negative reinforcement action.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Semi-supervised Learning<\/h3>\n\n\n\n<p id=\"Par140\">This is a mix of supervised and&nbsp;unsupervised learning. This arises when you have a small amount of unlabeled data. But you can use deep learning systems to translate the unsupervised data to supervised data\u2014a process that is called pseudo-labeling. After this, you can then apply the algorithms.<\/p>\n\n\n\n<p id=\"Par141\">An interesting use case of semi-supervised learning is the interpretation of MRIs. A radiologist can first label the scans, and after this, a deep learning system can find the rest of the patterns.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some algorithms are quite easy to calculate, while others require complex steps and mathematics. The good news is that you usually do not have to compute an algorithm because there are a variety of languages like Python and R that make the process straightforward. As for&nbsp;machine learning, an algorithm is typically different from a traditional [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3326,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[441],"tags":[],"class_list":["post-3418","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-3-machine-learning"],"jetpack_featured_media_url":"https:\/\/workhouse.sweetdishy.com\/wp-content\/uploads\/2024\/08\/images-41-1.jpeg","_links":{"self":[{"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/posts\/3418","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/comments?post=3418"}],"version-history":[{"count":1,"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/posts\/3418\/revisions"}],"predecessor-version":[{"id":3419,"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/posts\/3418\/revisions\/3419"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/media\/3326"}],"wp:attachment":[{"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/media?parent=3418"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/categories?post=3418"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/workhouse.sweetdishy.com\/index.php\/wp-json\/wp\/v2\/tags?post=3418"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}