Linq to Xml – querying an RSS feed

Linq (and all its flavors) will come out with .Net 3.5 and Visual Studio 2008. Along with the Xml support are classes such as XDocument, XElement, XAttribute, etc. What’s interesting about XElement in particular is that it allows us to load some Xml from many sources and query into it. Depending on your experience with XPath, an Xml Linq query may be easier to write AND read.

Let’s look at a sample Linq to Xml query. I’m going to use my blog’s RSS feed, http://feeds.feedburner.com/jeffreypalermo.

Let’s start simple. We’ll read in the Xml from the RSS feed and enumerate through all the posts (or "items" in the RSS world).

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

var items = from item in rssFeed.Elements("channel").Elements("item")
 select item;

Console.WriteLine(items.Count());
foreach (object o in items)
{
 Console.WriteLine(o);
}

Note that we are merely reading in the feed into an XElement and then selecting all item nodes from the document.

That’s pretty interesting that we can filter to only the "item" nodes that quickly, but what if I want to find the posts with the author of "Jeffrey Palermo" (never mind that this is my feed, but imagine it was a composite feed).

XNamespace dc = "http://purl.org/dc/elements/1.1/";

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

var items = from item in rssFeed.Elements("channel").Elements("item")
 where item.Element(dc + "creator").Value == "Jeffrey Palermo"
 select item;

Console.WriteLine(items.Count());
foreach (object o in items)
{
 Console.WriteLine(o);
}

I gave this example to illustrate how to use the XNamespace class to help filter in nodes that declare a namespace prefix such as "dc:creator" in an RSS feed. Note the "where" clause in my Linq query.

Let’s go further and filter down to just the posts that contain the term "altnetconf" (or choose another term to search for)

XNamespace dc = "http://purl.org/dc/elements/1.1/";

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

var items = from item in rssFeed.Elements("channel").Elements("item")
 where item.Element(dc + "creator").Value == "Jeffrey Palermo"
 && item.Element("description").Value.Contains("altnetconf")
 select item;

Console.WriteLine(items.Count());
foreach (object o in items)
{
 Console.WriteLine(o);
}

Now we’ve seen how the where clause functions. Let’s shape the data now. I’ll drop the where clause for simplicity, and I want to grab my feed and put a title and abstract into a collection for use in my application. I’ll use an anonymous type to help me do that.

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

var items = from item in rssFeed.Elements("channel").Elements("item")
 select new { Title = item.Element("title").Value, 
 Abstract = item.Element("description").Value.Substring(0, 100) }
;

Console.WriteLine(items.Count());
foreach (object o in items)
{
 Console.WriteLine(o);
}

Here, I have created an anonymous type with a Title and Abstract. Now that I have the information in object form, I can work with it. I would not want to work with Xml throughout my application, but only at the edges. I prefer to pull in the information and get it into my domain model as quickly as possible because objects are easier to work with that raw Xml.

I’m not finished yet because my type doesn’t have a name, so there is no way I can pass my IEnumberable<whateverthetypeis> to another part of my application. I need to hydrate one of my domain objects so I can work with my abstracts.

I’m going to use this class:

public class PostAbstract
{
 private string _title;
 private string _abstract;

 public string Title
 {
 get { return _title; }
 set { _title = value; }
 }

 public string Abstract
 {
 get { return _abstract; }
 set { _abstract = value; }
 }
}

I’ll have the results of my Xml query create a set of PostAbstracts, and then I can work with them.

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

IEnumerable<PostAbstract> items = from item in rssFeed.Elements("channel").Elements("item")
 select new PostAbstract { Title = item.Element("title").Value, 
 Abstract = item.Element("description").Value.Substring(0, 100) }
;

Console.WriteLine(items.Count());
foreach (PostAbstract anAbstract in items)
{
 Console.WriteLine("{0} - {1}", anAbstract.Title, anAbstract.Abstract);
}

Note that I merely gave my type a name, PostAbstract. I used object initializers to set the properties, and now I have a set of objects I can work with to accomplish my purpose.

If you enjoyed this article, subscribe to my feed: http://feeds.feedburner.com/jeffreypalermo