Find favicons used by any website

Will Lumley | Jun 14, 2022 min read

Some time ago, I was working on an iOS application that required me to fetch and display the favicon of a given URL.

Easy right? Well, not so much.

About 15 to 20 years ago, you’d be fine to set up a simple HTTP GET request at the server’s root with favicon.ico appended to it. So getting the favicon of the URL https://example.com would be as simple as an HTTP GET request to https://example.com/ico. You could get the image data from that request, ensure it’s displayed to the user, and then call it a day. Dust your hands off and be thankful for the easy retrieval.

Fast forward to the present day - I noticed that when using this method, many URLs weren’t having their favicon found.

let urlResponse = try await URLSession.shared.data(from: url)
let imageData = urlResponse.0
guard let image = UIImage(data: imageData) else {
  return
}

Strange, I thought. After investigating, some of the URLs were returning HTTP 401s, meaning that there was nothing at https://example.com/favicon.ico for me to find. After taking to Google, a few things became apparent. Firstly, the favicon may not be at the server root. It may be somewhere like https://example.com/images/images/favicon.ico.

Secondly, it may have a different file extension. As the file types range from PNG to SVG, it really is a mixed bag of possibilities. Thirdly, the filename of the image doesn’t have to be favicon, it could be anything the developer sees appropriate.

So, if all these possibilities exist, how can we find the favicon? The answer lies in the header section of the HTML page found at our URL. Below is a basic example of how favicons can be declared in the HTML.

<link rel="shortcut icon" href="favicon.ico">
<link rel="icon" type="image/png" href="/images/favicon-108x108.png">

Hold on, there are more than one favicons listed here?

Nowadays, it’s quite common for websites to declare several favicons. This is to allow a website to have favicons with a wide range of image sizes and image types, as different platforms will have different preferences for what favicons they want - and the developer of the website is offering a wide array of favicons so the platform can fetch the favicon they deem most appropriate.

Now we know where to find the favicon declarations in HTML. However, implementing that in our iOS application is a little more difficult.

First, we must find an HTML parser to fetch the HTML page and iterate through its tags. I found that scinfu’s SwiftSoup repository works really well, and it works well with Cocoapods, Carthage, and SPM, so you’ll be able to integrate it into your project regardless of what dependency manager you’re using.

https://github.com/scinfu/SwiftSoup

The code below demonstrates how to use SwiftSoup to fetch the favicon URL:

var htmlOpt: Document?
do {
   htmlOpt = try SwiftSoup.parse(htmlStr)
}
catch let error {
    print("Could NOT parse HTML due to error: \(error). HTML: \(htmlStr)")
    return nil
}
 
guard let html = htmlOpt else {
    print("Could NOT parse HTML from string: \(htmlStr)")
    return nil
}
 
guard let head = html.head() else {
   print("Could NOT parse HTML head from string: \(htmlStr)")
   return nil
}
var allLinks = Elements()
do {
    allLinks = try head.select("link")
}
catch let error {
    print("Could NOT parse HTML due to error: \(error). HTML: \(htmlStr)")
    return nil
}
let faviconTypes = [
    "apple-touch-icon",
    "apple-touch-icon-precomposed",
    "shortcut icon",
    "icon"
]
var favicons = [String]()
 
// Iterate over every 'link' tag that's in the head document, and collect them
for element in allLinks {
  do {
      let rel = try element.attr("rel")
      let href = try element.attr("href")
      let sizes = try element.attr("sizes")
      // If this is an icon that we deem might be a favicon, add it to our array
      if faviconTypes.contains(rel) {
          favicons.append(rel)
      }
  }
  catch let error {
      self.logger?.print("Could NOT parse HTML due to error: \(error). HTML: \(htmlStr)")
      continue
  }
}

Great, now we have written some basic logic that allows us to extract the URLs for the favicons from a website at a given URL. However, this is far from production-ready. There are other ways for developers to declare their favicons - such as in a web-application manifest JSON file - the file itself is declared in the HTML header, or in a similar fashion, there is Microsoft’s browser configuration XML file.

We also need to implement functionality that allows us to try one method of favicon searching (i.e., querying the HTML header for declarations), and then if that fails to fallback to another method (i.e., searching for filename).

There are other functionalities that would be nice to have, such as checking the root domain of a URL if a subdomain fails (i.e., querying https://example.com if https://subdomain.example.comfails). There is also checking for something called a meta refresh redirect. This is a client-side redirect (unlike HTTP 301, which is a server-side redirect) - where we have to manually check the HTML header tag for the meta refresh redirect and then spawn a new URLRequest with the URL contained within the HTML header tag.

If this all sounds like a giant headache, I have you covered. I have written a pure Swift iOS and macOS open source repository that handles all of this for you. It’s called FaviconFinder, and you can find it here:

https://github.com/will-lumley/FaviconFinder

With FaviconFinder, all you have to call is

let url = URL(string: "https://example.com")!
let favicon = try await FaviconFinder(url: url).downloadFavicon()
If you want to have a fine-tuned control of what type of favicon you're after, you can use FaviconFinder as follows:
let favicon = try await FaviconFinder(
    url: url, 
    preferredType: .html, 
    preferences: [
        .html: FaviconType.appleTouchIcon.rawValue,
        .ico: "favicon.ico",
        .webApplicationManifestFile: FaviconType.launcherIcon4x.rawValue
    ]
 ).downloadFavicon()

This allows you to control what type of download type FaviconFinder will use first and what subtype to look for when iterating through each download type. For the HTML download type, this allows you to prioritise different “rel” types. For the file.ico type, this allows you to choose the filename.

Suppose your desired download type doesn’t exist for your URL (i.e., you requested the favicon that exists as a file, but there’s no file). In that case, FaviconFinder will automatically try all other methods of favicon storage for you.