Parsing XML
How to use XMLParser for parsing XML documents
We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Check out the final solution. Copy and paste it into a swift playground and run it so you can see it in action.
This is a brief overview on XMLParser
and XMLParserDelegate
that comes with the swift standard library. The tutorial will go through step by step on parsing XML files in Swift.
We will parse an XML iTunes feed for upcoming music.
The <item/>
element is the item that we want to extract.
1. Create a struct template for the <item>
2. Create the parser
3. - Have parser parse the iTunes feed, using XMLParserDelegate and XMLParser
3. - Have the parser fill in the details of the struct
4. - Add the struct to our array of podcast items
5. Test run the parser
Create this struct called RSSPodcastItem
at the top of the file. This struct has similar attribues as the <item/>
element found in our iTunes feed.
The strategy will make sense further along the tutorial.
import Foundation
import UIKit
struct RSSPodcastItem {
var title: String
var description: String
var date: String
var category: [String]
// 1. Start off with an empty struct
init() {
self.title = ""
self.description = ""
self.date = ""
self.category = []
}
// 2. Reset all the properites
mutating func resetAllProperties() {
self = RSSPodcastItem()
}
}
After the struct declaration create the RSSPodcastParser
class and conform to the XMLParserDelegate
as shown below.
...omitting some code
class RSSPodcastParser: NSObject, XMLParserDelegate {
}
Now setup some initial variables. This is just some house keeping we need. The comments shown below outline what they actually do.
...omitting some code
class RSSPodcastParser: NSObject, XMLParserDelegate {
// What will be available to the consumer
var parsedData = [RSSPodcastItem]()
// ---------------------------------------------
private var xmlParser: XMLParser!
// we track currentElement so we know what element the XMLParserDelegate is currently on
private var currentElement = ""
// RSS Items that are tracked internally
private var rssItems = [RSSPodcastItem]()
// RSS Template object that is used to to build up our RSSItem
private var rssPodcastItem = RSSPodcastItem.init()
...
}
The private var currentElement
will be getting updated on where we are in the parsing. As XML get’s parsed we need to track what line element we are currently parsing. This will make sense further down the tutorial.
The private var rssPodcastItem
variable is our RSSPodcastItem
we defined earlier and will be used as a placeholder. As we parse each <item>
, we will build up the title
, category
, description
, date
in this temporary struct. This will make sense further down the tutorial.
This method startParsingWithContentsOfURL
will take a URL, with a completion handler. This method is what the consumer will use when this tutorial is completed.
class RSSPodcastParser: NSObject, XMLParserDelegate {
...omitting some code
// 0. This is the function where we pass a URL and get a completion handler
func startParsingWithContentsOfURL(rssURL: URL, with completion: (Bool)-> ()) {
let parser = XMLParser(contentsOf: rssURL)
parser?.delegate = self
if let flag = parser?.parse() {
completion(flag)
}
}
}
Swift docs states:
An
XMLParser
notifies its delegate about the items (elements, attributes, CDATA blocks, comments, and so on) that it encounters as it processes an XML document. It does not itself do anything with those parsed items except report them. It also reports parsing errors.
The XMLParserDelegate
has many methods that inform its delegate. Do try the other ones, once you get the hang of it.
These are the one we are going to use:
* didStartElement
* foundCharacters
* didEndElement
* parserDidEndDocument
* parseErrorOccurred
Here is how things will happen:
1. didStartElement will be called on <item>;
1. didStartElement will be called on <title>;
2. foundCharacters will be called on “State”
3. foundCharacters will be called on "I'm In"
4. foundCharacters will be called on "-"
5. foundCharacters will be called on "Aaron Lewis"
6. didEndElement will be called on <title/>;
7. didStartElement will be called on <category>;
8. and so on…
The XMLParser
will notify that it has reached the beginning of an element with the method didStartElement
.
When this method executes, it provides an elementName
. We assign elementName
to currentElement
. Then we use a conditional check to see what element it is.
Based on our iTunes XML Markup we want to check for an <item>
.
If the element is <item>
, then we want to reset all the properties on our self.rssPodcastItem
struct as that will be our template we use for building up our podcast item. We only reset the properties when we know we are at the beginning of a new item
element in our XML document.
The currentElement
is a reference variable we created within the file. We are updating this variable and we are using it in other methods for reference.
class RSSPodcastParser: NSObject, XMLParserDelegate {
...omitting some code
// 1. This gets called when the opening elementName is the currentElement name == "item"
func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String] = [:]) {
currentElement = elementName
if currentElement == "item" {
self.rssPodcastItem.resetAllProperties() // Starting a brand new item with all properties reset to ""
}
}
}
The next element in the XML document after <item>
is <title>
. So didStartElement
gets fired again and assign the currentElement
as title
.
1. didStartElement will be called on <item>;
------> 2. didStartElement will be called on <title>;
3. foundCharacters will be called on “State”
4. foundCharacters will be called on "I'm In"
5. foundCharacters will be called on "-"
6. foundCharacters will be called on "aron Lewis"
7. didEndElement will be called on <title/>;
8. and so on…
After the <title>
we have the string "State I'm In - Aaron Lewis"
.
<title> State I'm In - Aaron Lewis </title>
The method foundCharacters
will stop on line breaks and “special characters”. So what plays out is the following below.
1. didStartElement will be called on <item>;
2. didStartElement will be called on <title>;
------> 3. foundCharacters will be called on “State”
------> 4. foundCharacters will be called on "I'm In"
------> 5. foundCharacters will be called on "-"
------> 6. foundCharacters will be called on "aron Lewis"
7. didEndElement will be called on <title/>;
8. and so on…
Within our method body we have the following written:
class RSSPodcastParser: NSObject, XMLParserDelegate {
...omitting some code
// 2. This gets parsed when the currentElement is found. currentElement got assigned in the method above
// - this going through word by word and adding them together into a sentence
func parser(_ parser: XMLParser, foundCharacters string: String) {
let string = string.trimmingCharacters(in: .whitespacesAndNewlines)
switch currentElement {
case "title":
self.rssPodcastItem.title += string
case "description":
self.rssPodcastItem.description += string
case "pubDate":
self.rssPodcastItem.date += string
case "category":
if (!string.isEmpty) {
self.rssPodcastItem.category.append(string)
}
default: break
}
}
}
When foundCharacters
method is executed we get access to the string
. In this example it will be "State"
.
Then we have a switch
statement. And since we have a reference to currentElement
we know its at <title>
, we make a match on case "title"
and we concat the string
"State"
to our structs title.
This method will do this multiple times as it parses the characters for “State I’m In - Aaron Lewis”
self.rssPodcastItem.title += "State"
self.rssPodcastItem.title += " "
self.rssPodcastItem.title += "I'm In"
self.rssPodcastItem.title += "-"
self.rssPodcastItem.title += " "
self.rssPodcastItem.title += "Aaron Lewis"
We concat the string
to our structs title:
// YES
self.rssPodcastItem.title += string
// NO
self.rssPodcastItem.title = string
As mentioned, the method foundCharacters
will stop on line breaks and “special characters”.
1. didStartElement will be called on <item>;
2. didStartElement will be called on <title>;
3. foundCharacters will be called on “State”
4. foundCharacters will be called on "I'm In"
5. foundCharacters will be called on "-"
6. foundCharacters will be called on "aron Lewis"
------> 7. didEndElement will be called on <title/>;
8. and so on…
Similar to the didStartElement
function, this one notifies the delegate when it has arrived at the closing element.
We get notified that the </item>
has been reached. Take our temporary struct self.rssPodcastItem
that we have been building up in the previous step(s) and append it to our private array of self.rssItems
. The struct is now complete.
The private var rssItems = [RSSPodcastItem]()
is private and for us to keep track of our RSSPodcastItem
.
class RSSPodcastParser: NSObject, XMLParserDelegate {
...omitting some code
// 3. Closing Tag. When the element tag ends
func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
if elementName == "item" {
let item = self.rssPodcastItem
self.rssItems.append(item)
}
}
}
Sent by the parser object to the delegate when it has successfully completed parsing.
Do some basic house cleaning. We are going to take our private var rssItems
and assign it to our our variable self.parsedData
. The variable parsedData
, will be available to consumers of this class.
class RSSPodcastParser: NSObject, XMLParserDelegate {
...omitting some code
// 4. Notifies Delegate when completed. At this point we just assigned it the parsedData property
func parserDidEndDocument(_ parser: XMLParser) {
self.parsedData = self.rssItems
}
}
Sent by a parser object to its delegate when it encounters a fatal error.
In this example we are going to parse some error information in our console.
class RSSPodcastParser: NSObject, XMLParserDelegate {
...omitting some code
//5. Errors
func parser(_ parser: XMLParser, parseErrorOccurred parseError: Error) {
print(parseError.localizedDescription)
}
}
Go to this blog post to see the final implementation.
The main thing I want anyone to take away from this tutorial is how simple and flexible the XMLParser
and XMLParserDelegate
are in the Swift documentation. They really do make it easy to set up and parse XML documents. Once you get the hang of it.
I relied heavily on other posts and documentation to get the hang of this. This tutorial is has similar setup and function to Understanding XMLParser in Swift - Lucas Cerro . Great tutorial. And the swift documentation Swift Documentation - XMLParser, Swift Documentation - XMLParserDelegate should be read, because it contains more interesting methods that could be applied in the setup.