Parsing XML

How to use XMLParser for parsing XML documents

Before starting

Check out the final solution. Copy and paste it into a swift playground and run it so you can see it in action.

Introduction

This is a brief overview on XMLParser and XMLParserDelegate that comes with the swift standard library. The tutorial will go through step by step on parsing XML files in Swift.

XML feed sample

We will parse an XML iTunes feed for upcoming music.

xml feed

The <item/> element is the item that we want to extract.

1. Create a struct template for the <item>
2. Create the parser
3. - Have parser parse the iTunes feed, using XMLParserDelegate and XMLParser
3. - Have the parser fill in the details of the struct
4. - Add the struct to our array of podcast items
5. Test run the parser

Struct as a template for a podcast item

Create this struct called RSSPodcastItem at the top of the file. This struct has similar attribues as the <item/> element found in our iTunes feed.

The strategy will make sense further along the tutorial.


import Foundation
import UIKit


struct RSSPodcastItem {
    var title: String
    var description: String
    var date: String
    var category: [String]

    // 1. Start off with an empty struct
    init() {
        self.title = ""
        self.description = ""
        self.date = ""
        self.category = []
    }

    // 2. Reset all the properites
    mutating func resetAllProperties() {
        self = RSSPodcastItem()
    }
}

XMLParserDelegate

After the struct declaration create the RSSPodcastParser class and conform to the XMLParserDelegate as shown below.


...omitting some code

class RSSPodcastParser: NSObject, XMLParserDelegate {

}

Variables

Now setup some initial variables. This is just some house keeping we need. The comments shown below outline what they actually do.


...omitting some code

class RSSPodcastParser: NSObject, XMLParserDelegate {

    // What will be available to the consumer
    var parsedData = [RSSPodcastItem]()
    // ---------------------------------------------
    private var xmlParser: XMLParser!
    // we track currentElement so we know what element the XMLParserDelegate is currently on
    private var currentElement = ""
    // RSS Items that are tracked internally
    private var rssItems = [RSSPodcastItem]()
    // RSS Template object that is used to to build up our RSSItem
    private var rssPodcastItem =  RSSPodcastItem.init()


  ...

}

currentElement

The private var currentElement will be getting updated on where we are in the parsing. As XML get’s parsed we need to track what line element we are currently parsing. This will make sense further down the tutorial.

rssPodcastItem

The private var rssPodcastItem variable is our RSSPodcastItem we defined earlier and will be used as a placeholder. As we parse each <item>, we will build up the title, category, description, date in this temporary struct. This will make sense further down the tutorial.

The consumer method

This method startParsingWithContentsOfURL will take a URL, with a completion handler. This method is what the consumer will use when this tutorial is completed.


class RSSPodcastParser: NSObject, XMLParserDelegate {
  ...omitting some code

    // 0. This is the function where we pass a URL and get a completion handler
    func startParsingWithContentsOfURL(rssURL: URL, with completion: (Bool)-> ()) {
      let parser = XMLParser(contentsOf: rssURL)
          parser?.delegate = self
      if let flag = parser?.parse() {
        completion(flag)
      }
    }
}

Swift docs states:

An XMLParser notifies its delegate about the items (elements, attributes, CDATA blocks, comments, and so on) that it encounters as it processes an XML document. It does not itself do anything with those parsed items except report them. It also reports parsing errors.

XML Parser delegate

The XMLParserDelegate has many methods that inform its delegate. Do try the other ones, once you get the hang of it.

These are the one we are going to use:

* didStartElement
* foundCharacters
* didEndElement
* parserDidEndDocument
* parseErrorOccurred

Here is how things will happen:

1. didStartElement will be called on <item>;
1. didStartElement will be called on <title>;
2. foundCharacters will be called on “State”
3. foundCharacters will be called on "I'm In"
4. foundCharacters will be called on "-"
5. foundCharacters will be called on "Aaron Lewis"
6. didEndElement will be called on <title/>;
7. didStartElement will be called on <category>;
8. and so on…

didStartElement

The XMLParser will notify that it has reached the beginning of an element with the method didStartElement.

When this method executes, it provides an elementName. We assign elementName to currentElement. Then we use a conditional check to see what element it is.

Based on our iTunes XML Markup we want to check for an <item>.

If the element is <item>, then we want to reset all the properties on our self.rssPodcastItem struct as that will be our template we use for building up our podcast item. We only reset the properties when we know we are at the beginning of a new item element in our XML document.

The currentElement is a reference variable we created within the file. We are updating this variable and we are using it in other methods for reference.

class RSSPodcastParser: NSObject, XMLParserDelegate {

  ...omitting some code

    // 1. This gets called when the opening elementName is the currentElement name == "item"
    func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String] = [:]) {

        currentElement = elementName
        if currentElement == "item" {
            self.rssPodcastItem.resetAllProperties() // Starting a brand new item with all properties reset to ""
        }

    }

}

Notify didStartElement again

The next element in the XML document after <item> is <title>. So didStartElement gets fired again and assign the currentElement as title.

1. didStartElement will be called on <item>;
------> 2. didStartElement will be called on <title>;
3. foundCharacters will be called on “State”
4. foundCharacters will be called on "I'm In"
5. foundCharacters will be called on "-"
6. foundCharacters will be called on "aron Lewis"
7. didEndElement will be called on <title/>;
8. and so on…

foundCharacters

After the <title> we have the string "State I'm In - Aaron Lewis".

  <title> State I'm In - Aaron Lewis </title>

The method foundCharacters will stop on line breaks and “special characters”. So what plays out is the following below.

1. didStartElement will be called on <item>;
2. didStartElement will be called on <title>;
------> 3. foundCharacters will be called on “State”
------> 4. foundCharacters will be called on "I'm In"
------> 5. foundCharacters will be called on "-"
------> 6. foundCharacters will be called on "aron Lewis"
7. didEndElement will be called on <title/>;
8. and so on…

Within our method body we have the following written:

class RSSPodcastParser: NSObject, XMLParserDelegate {
  ...omitting some code

    // 2. This gets parsed when the currentElement is found. currentElement got assigned in the method above
    // - this going through word by word and adding them together into a sentence

    func parser(_ parser: XMLParser, foundCharacters string: String) {
        let string = string.trimmingCharacters(in: .whitespacesAndNewlines)
        switch currentElement {
        case "title":
            self.rssPodcastItem.title += string
        case "description":
            self.rssPodcastItem.description += string
        case "pubDate":
            self.rssPodcastItem.date += string
        case "category":
            if (!string.isEmpty) {
                self.rssPodcastItem.category.append(string)
            }
        default: break
        }
    }
}

When foundCharacters method is executed we get access to the string. In this example it will be "State".

Then we have a switch statement. And since we have a reference to currentElement we know its at <title>, we make a match on case "title" and we concat the string "State" to our structs title.

This method will do this multiple times as it parses the characters for “State I’m In - Aaron Lewis”

self.rssPodcastItem.title += "State"
self.rssPodcastItem.title += " "
self.rssPodcastItem.title += "I'm In"
self.rssPodcastItem.title += "-"
self.rssPodcastItem.title += " "
self.rssPodcastItem.title += "Aaron Lewis"

We concat the string to our structs title:

// YES
self.rssPodcastItem.title += string

// NO
self.rssPodcastItem.title = string

As mentioned, the method foundCharacters will stop on line breaks and “special characters”.

didEndElement

1. didStartElement will be called on <item>;
2. didStartElement will be called on <title>;
3. foundCharacters will be called on “State”
4. foundCharacters will be called on "I'm In"
5. foundCharacters will be called on "-"
6. foundCharacters will be called on "aron Lewis"
------>  7. didEndElement will be called on <title/>;
8. and so on…

Similar to the didStartElement function, this one notifies the delegate when it has arrived at the closing element.

We get notified that the </item> has been reached. Take our temporary struct self.rssPodcastItem that we have been building up in the previous step(s) and append it to our private array of self.rssItems. The struct is now complete.

The private var rssItems = [RSSPodcastItem]() is private and for us to keep track of our RSSPodcastItem.

class RSSPodcastParser: NSObject, XMLParserDelegate {

  ...omitting some code

    // 3. Closing Tag. When the element tag ends
    func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
        if elementName == "item" {
            let item = self.rssPodcastItem
            self.rssItems.append(item)
        }
    }

}

parserDidEndDocument

Sent by the parser object to the delegate when it has successfully completed parsing.

Do some basic house cleaning. We are going to take our private var rssItems and assign it to our our variable self.parsedData. The variable parsedData, will be available to consumers of this class.

class RSSPodcastParser: NSObject, XMLParserDelegate {
  ...omitting some code


    // 4. Notifies Delegate when completed. At this point we just assigned it the parsedData property
     func parserDidEndDocument(_ parser: XMLParser) {
         self.parsedData = self.rssItems
     }

}

parseErrorOccurred

Sent by a parser object to its delegate when it encounters a fatal error.

In this example we are going to parse some error information in our console.

class RSSPodcastParser: NSObject, XMLParserDelegate {
  ...omitting some code

    //5. Errors
    func parser(_ parser: XMLParser, parseErrorOccurred parseError: Error) {
        print(parseError.localizedDescription)
    }

}

Final implementation

Go to this blog post to see the final implementation.

In Conclusion

The main thing I want anyone to take away from this tutorial is how simple and flexible the XMLParser and XMLParserDelegate are in the Swift documentation. They really do make it easy to set up and parse XML documents. Once you get the hang of it.

I relied heavily on other posts and documentation to get the hang of this. This tutorial is has similar setup and function to Understanding XMLParser in Swift - Lucas Cerro . Great tutorial. And the swift documentation Swift Documentation - XMLParser, Swift Documentation - XMLParserDelegate should be read, because it contains more interesting methods that could be applied in the setup.