In my Beyond Frameworks talk, I explained how a component-based architecture can help answer some of the important (i.e. expensive!) questions you might face when creating long-lived apps that rely on a PHP framework. In this series of blog posts, I’m going to look at how to go about creating and working with components.

In the last article, I edited the build.properties file to set the component’s name and version number. I need to finish setting up the component’s metadata by editing package.xml.

Introducing package.xml

Every PEAR-compatible component needs to ship with a manifest file called package.xml. You can see the the unedited version on GitHub that we’re about to change.

package.xml contains the following useful information:

  • The component’s name, channel, version number, and description.
  • A list of the component’s authors.
  • The component’s license.
  • A list of the component’s contents.
  • A list of the component’s dependencies, including PHP version, PHP extensions, and other components.

(package.xml can hold other information too, but I’ve yet to see them being put to practical use, so we’ll leave them for today and come back to them if needed later on).

In our component skeleton, we need to add some of this data to package.xml, and the rest will be added automatically by the tools when we build a PEAR-compatible package, to save us the trouble of doing it by hand.

Setting Up The Component’s Name And Channel

All PEAR-compatible packages have a fully-qualified name of the form <channel>/<name>. In this skeleton, our tools will automatically inject project.name from the build.properties file into package.xml’s <name> tag when we build the PEAR-compatible package. We just need to edit the <channel> tag in package.xml to point to our server.

As I’m going to publish these packages via our public PEAR channel, I’ve set <channel> to be “pear.gradwell.com”. Don’t worry if you don’t have your own PEAR channel yet; they’re very easy to create, and I’ll look at exactly how to set one up and publish it later in this series.

Setting Up The Component’s Version Number, And Description

All PEAR-compatible packages have a version number. We’ve already set this in build.properties; our tools will automatically inject this into package.xml. (We could have put this directly into package.xml, but the version number is duplicated in a couple of places, and it seems much easier to let the tools deal with that so that we don’t forget to update all the version numbers when we make a new release).

For the description, edit the &lt:summary> and <description> tags. The descriptions get used in the human-friendly HTML page that each PEAR channel has, so they’re worth the trouble of filling in.

Setting Up The Component’s Author

There’s a few tags inside package.xml that tell anyone who is interested who the current author of the component is. This information gets used in the human-friendly HTML page that each PEAR channel has. I’ve set these tags as follows:

[code lang=”xml”]
Stuart Herbert
stuartherbert
stuart@stuartherbert.com
yes
[/code]

Setting Up The Component’s License

The <license> tag contains text to say under what license the component has been released. By default, the skeleton sets this to be “All rights reserved.” Whilst I’d love you to opensource all of your components, I wouldn’t like to presume 😉

For our component, I’ve set this tag to say “New BSD license.”.

Setting The List Of The Component’s Contents

The <contents> tag is where the PEAR installer expects to find a file-by-file list of all of the files that need to be installed, along with instructions on where to install them. Each file is flagged as being a specific ‘role’ – script, php code, data file, unit test, website page – and the PEAR installer then decides for you where it will put them. Don’t try fighting it; it has some built-in assumptions that aren’t worth trying to work around.

We don’t need to edit this tag at all. When we build our PEAR-compatible package later on, the tools will manage this section for us. It saves a lot of time and effort, and debugging!

Setting Up The Component’s Dependencies

Finally, we get to the bit that makes it all worthwhile: the list of what this component needs in order to work. By getting these right, we can reliably re-create the entire environment that our app needs with just a single command.

A practical demonstration is needed, I think, because this is probably something you’ve never had the opportunity to do before now.

stuart:~/Devel/sental/repustateApi: phing build-vendor

When I run that command, our tools will download all of the dependencies listed in package.xml, and install them into a folder called ‘vendor’. This gives us a reproducible sandbox, where we can find the correct versions of all the code needed to unit test our component. We don’t have to ensure the right version is installed on the computer; we get our own copy of everything needed. If you develop on shared servers, or even if you end up maintaining a lot of components, this is a real time saver.

If you run that command locally, and look inside the ‘vendor’ folder it creates, you should see that our tools have downloaded and installed the Autoloader package that’s listed in package.xml’s list of dependencies. As we add more PEAR-compatible packages to the package.xml’s dependency list, we can rebuild the vendor folder and install all of the dependencies side by side. I’ve picked the repustateApi component because it will allow me to demonstrate that as we go along.

Please ignore the errors that you will see scroll up the screen when you run ‘phing build-vendor’ at this stage. Unfortunately, the PEAR installer doesn’t seem to have a command to install just the dependencies (i.e. pear install –onlydeps) listed. It’s also trying to install our component’s PEAR-compatible package, which doesn’t exist yet.

The Final package.xml File

You can see the final package.xml file, containing all of the edits listed in this blog post, on Github.

As we work on the component and then release it, we will need to come back and edit package.xml a little more. But we’ve done what we need to, for now, and can get started on creating some code.

My apologies if it seems like a lot of work just to create an empty component and then sort out the component’s metadata. In all honesty, after you’ve done it a few times, you can create a new component in a matter of minutes, and this is work you only have to do the once.

In the next blog post, we’re going to start creating the initial tests for this component.

Be the first to leave a comment »

In my Beyond Frameworks talk, I explained how a component-based architecture can help answer some of the important (i.e. expensive!) questions you might face when creating long-lived apps that rely on a PHP framework. In this series of blog posts, I’m going to look at how to go about creating and working with components.

In the last post, I used phix to create an empty component. I now need to edit the component’s manifest files before I can start cutting code.

What’s Your Component Called?

The first file we need to edit is called build.properties. It’s a standard-format .ini file that contains a few bits of data for us to edit. When I designed the component skeleton, I put these in their own file so that we never have to edit the build.xml file … which means we can upgrade build.xml in future via the phix php-library:upgrade command.

An unedited build.properties file looks like this:

project.name=<your project name>
project.majorVersion=X
project.minorVersion=Y
project.patchLevel=Z

checkstyle.standard=Zend

component.type=php-library
component.version=3

As this is the first version of my component, I’m going to call this version of the component repustateApi-0.1.0. That means that I need to change the top four lines of build.properties to look like this:

project.name=repustateApi
project.majorVersion=0
project.minorVersion=1
project.patchLevel=0

(You can see the fullly-edited build.properties file on GitHub).

Why did I pick 0.1.0 as the version number?

What’s In A Version Number?

When it comes to components, we need to adopt a sane and engineering-focused version numbering scheme, otherwise we run into a lot of trouble when trying to trust our own and other people’s components over time. I went into this in some detail in my Beyond Frameworks talk, and it basically boils down to this:

  • The scheme itself reads major.minor.patchLevel, or, if you prefer, x.y.z.
  • All versions 1.y.z must be 100% backwards-compatible. You can fix bugs and add new features, but if you break backwards-compatibility, you need to create version 2.y.z.

    The people who use your components need to be confident that they can upgrade the component (to get the latest fixes and features) without any nasty surprises that force them to also go to the trouble of editing their own code. Make the major version number your promise to other developers that they can upgrade safely.

    Don’t be afraid to create version 2.y.z, version 3.y.z etc as rapidly as needed. Google has done exactly this with Chrome, and it hasn’t exactly done them any harm.

  • If you’re adding new features, and not breaking backwards compatibility, turn version 1.0 into version 1.1, 1.2, and so on.
  • And if you’re only putting out bug fixes, just increase the last number, and turn version 1.0.0 into version 1.0.1, 1.0.2 and so on.

This numbering scheme is called Semantic Versioning. Please do follow it.

That’s all that we need to edit in build.properties. We still need to setup package.xml, which I’ll cover in the next blog post.

Be the first to leave a comment »

In my Beyond Frameworks talk, I explained how a component-based architecture can help answer some of the important (i.e. expensive!) questions you might face when creating long-lived apps that rely on a PHP framework. In this series of blog posts, I’m going to look at how to go about creating and working with components.

In my last article, I posted a list of questions to consider when decomposing the design of an app, and I put together the first cut of the components that will make up my sentiment analysis app. Now it’s time to get into creating the app’s first component, using phix.

Dev Environment Prep

If you haven’t already, you’ll need to install phix onto your dev computer. You’ll find the latest instructions on how to install phix on the Phix Project’s website. Thankfully, we only have to do that once. Now, with the development environment prepared, we can get our component under way.

Creating The Skeleton For The Component

Creating a skeleton component is as easy as:

stuart:~/Devel/sental$ mkdir repustateApi
stuart:~/Devel/sental$ cd repustateApi
stuart:~/Devel/sental$ git init .
Initialized empty Git repository in /home/stuart/Devel/sental/repustateApi/.git
stuart:~/Devel/sental/repustateApi$ phix php-library:init .
Initialised empty php-library component in .

What have we created?

A Quick Tour Of The Skeleton

Here’s a quick list of what the php-library:init command has created in the repustateApi folder:

stuart:~/Devel/sental/repustateApi$ git status
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add ..." to include in what will be committed)
#
#	.gitignore
#	.hgignore
#	LICENSE.txt
#	README.md
#	build.properties
#	build.xml
#	package.xml
#	src/
nothing added to commit but untracked files present (use "git add" to track)

Each of the files serves a useful purpose:

  • .gitignore and .hgignore

    These are basic filters to tell Git and Mercurial what to ignore in here. As we develop the component, there will be a few temporary folders created which we’ll never want to end up in version control. Filter files for other version control systems are welcome.

  • LICENSE.txt

    Every component needs to come with a clear statement about the rules for re-use, even if it just says All rights reserved.

    If you’re looking to open-source your code, you’ll find many well-tested licenses from the Open Source Initiative’s website.

  • README.md

    Every component needs basic documentation to help a new user get started with it. By putting that into a README.md file, GitHub will automatically pick that up and display it on the component’s website.

  • build.properties

    This is a file we’re going to edit shortly. It contains a little bit of metadata about your component, such as it’s name and version number.

  • build.xml

    This is a file for phing, a sort-of Ant clone written in PHP. We’re going to use the commands defined in this file to run our unit tests, build our PEAR-compatible package, and even install it for local testing.

  • package.xml

    This is the manifest file that the PEAR installer looks for when trying to install your component.

    We’re going to edit this file shortly, adding in some data about our component, but most importantly, listing all of the components that our component depends on. Don’t let the terse PEAR documentation put you off; it’s quite an easy file to work with, and the error-prone bits get auto-generated for you by the build.xml file when the time comes.

  • src/ folder

    Here’s where your source code goes. There’s a subfolder under here for each of the different types of file that your component might contain. Don’t worry about that too much for now; we’ll explain what goes where as we build these components up.

Why A Skeleton Helps

Our skeleton takes care of all of the chores and housekeeping for you, leaving you free to focus on writing your code and your tests. It gives you a standard structure to work within, making it very easy to move from one component to the next, or share development duties with a wider (and possibly remote) team.

This structure has been carefully designed to work with the somewhat fickle PEAR installer’s preconceptions, and also to work well with both PHPUnit and Netbeans. It should work well with Eclipse/PDT too; let me know how you get on. And old skool vim users shouldn’t have any troubles either 🙂

As the skeleton improves (pull requests are very welcome!), you can use the very handy phix php-library:upgrade command to get the improvements dropped right into your components. Another bonus feature of adopting a standardised skeleton instead of rolling everything by hand.

In the next blog post, we’ll get build.properties and package.xml setup, and get the vendor/ folder created. What’s the vendor folder? I’ll tell you next time 🙂

Be the first to leave a comment »

In my Beyond Frameworks talk, I explained how a component-based architecture can help answer some of the important (i.e. expensive!) questions you might face when creating long-lived apps that rely on a PHP framework. In this series of blog posts, I’m going to look at how to go about creating and working with components.

In the last article, I set out some very simple requirements for my example app. Now that I have an idea of what I want to build, I need to break that down into a set of components to create. My apologies if you’re keen to jump straight to the code; it’s coming soon, and I want to make sure it’s the right code when we get there 🙂

Thought Process: Decomposition

Decomposition is all about figuring out how to chop up a large problem into a co-operating sets of smaller problems. If you get the boundaries right across the set, then you can solve each of the smaller problems on their own, glue it all together, and job done.

In a traditional PHP app, decomposition normally means getting into class design. When you’re working with components, you need to decide how you’re going to group those classes together into components.

Figuring out your list of components and their contents very much a creative process. I’m a visual person, so I do it by scribbling on whiteboards or moving things around in OmniGraffle. There are few right or wrong solutions – most ‘wrong’ solutions are really just philosophically differences between people – but there are a few important questions that I ask myself to avoid problems later down the road.

Questions When Decomposing To Components

When I’m deciding what should be grouped into components, I’m always asking myself these questions:

  • Does the component do one job, and do it well?

    This is the key question. Just like classes that have too many responsibilities become too complex to both use and maintain, the exact same trap lies in wait when you start designing components, just on a larger scale.

    Take a look at the components that phix is made up of, by reading its package.xml file. Each of those components does one job, and only one job. phix glues them all together, but you could take any one of those components, and use it to solve the same problem in a different app.

  • Is it clear which components relies on which?

    A good feel-good factor for deciding whether you’ve got the right components or not is how easy it is to compile a list of which components your app relies on, and which components in turn they rely on. If you feel any doubt or uncertainty, then listen to your heart, and carve up the code into a different set of components.

    Ultimately, you need to be able to fill out your component’s package.xml file with a definitive list. Take a look at ComponentManager’s package.xml file as an example. It clearly states every component that ComponentManager glues together.

  • Do two components mutually depend upon each other?

    One of the key benefits of creating components is being able to isolate change. If I change Component A, I might have to change Component B too, because I’ve broken backwards compatibility. That doesn’t cause any architectural problems. Now, if I change Component B, and then have to change Component A too, then that is a very important problem. You have to install components one at a time. Which one do you install first?

    This problem can also appear in more subtle ways, when you have three or more components that rely on each other, for example when Component A relies on Component B, Component B relies on Component C, and then Component C relies on Component A.

    This is an easy one to work out. Draw out the list of components as a graph, connecting up which components rely on which. If you end up with a tree, you’re fine. If your graph has cycles, you’ve got a problem that needs to be fixed. You can’t reliably install components with mutual dependencies.

  • Is there any shared code I can split out into its own component?

    One way to solve the interdependence problem between Components A and B is to move some of the functionality out into a Component C, and then have both Components A and B rely on that, instead of each other. This is also one way of achieving DRY: Don’t Repeat Yourself.

    An example of this can be seen in the ComponentManagerShared component. Both ComponentManagerPhpLibrary and ComponentManagerPhpDocbook rely on it, but they don’t need to rely on each other at all.

  • How is someone going to install and upgrade this component?

    What I’m asking here is: will the component be listed in the <dependencies> section of a component’s package.xml (or an app’s!), or will a user install the component for himself by running the pear command? Or maybe both?

    Sometimes, you’ll want to introduce a meta-component into your design to make it very easy to install and upgrade a larger collection of components.

    ComponentManager is a meta-package; all it does is pull in all the other components that support creating and managing specific types of components. Every time I find and fix a bug, or add a new feature, the bug fix or feature goes into one of the specific components (such as ComponentManagerPhpLibrary) … but you don’t need to keep track of them all. All you have to do is pear upgrade phix/ComponentManager, and it takes care of upgrading all of the other components for you.

  • Where will your component be installed?

    Components can be installed into two places on a computer: system-wide (e.g. /usr/share/php), or inside an app (e.g. vendor/ folder inside the app’s code tree). Does the component work in both places?

    This is a question I personally wish the Doctrine developers had asked themselves. At the time of writing, Doctrine assumes that it will only be installed system-wide, making it a real pain in the backside if I want to run two or more separate apps that use Doctrine on the same server. By forcing me to install Doctrine system-wide, they also force me to test (and potentially fix) all apps that use it in one go whenever I need to upgrade Doctrine.

    The components built using ComponentManager can easily support installing into both system-wide or inside an app to suit your choices, and I’ll show you exactly how that is done when we get there.

  • Can I get this functionality from an existing component from someone else?

    Or, put another way … am I re-inventing the wheel here? Or living my not-invented-here dreams?

    Today, PHP components are at an embryonic state compared to our cousin languages. There aren’t all that many out there, and support for the PSR0 autoloading standard is still to become widespread.

    That said, you should still take a look first before deciding to build your own component. Folks like the Symfony 2 community seriously get components, and are doing a great job in publishing high-quality code that might just suit your needs.

  • Am I ever going to re-use this code?

    I’ve saved the most important question for last. Is it worth the trouble? There isn’t much practically in creating a component for the sake of it. If you can’t see yourself (or anyone else) ever re-using the code, put the code inside another component (or your app) for now. You can always break it out into a component at a later date.

    This is exactly what I did with phix. It started off as a single PEAR-compatible package, and as it evolved, I extracted out more and more code into separate PEAR-compatible packages when I felt it was worth doing so.

These are all questions about how to organise the code, and it’s this organisation – this list of components and their dependency graph – that then affects the design of the code inside each component. It’s good to start out with a plan, but don’t get hung up on it. As you build the code, and learn to feel what it’s like to live with the list of components, you’ll probably want to make changes to make life easier. There’s nothing at all wrong with that.

So what could the component list for my example app look like?

Component Design For The Sentiment App

If I apply these questions to the app I’m building in these blog posts, I end up with:

Components by type …

  • The orange components are going to contain useful code. Either I’m going to have to create them (sental, sentalXmlStore, repustateApi, lymbixApi, rssFeeder) or they already exist (phix, phix’s dependencies).
  • The green components are meta-packages, to make it easy to manage creating and updating the code (sentalApis, sentalFeeders).
  • The blue component contains shared code (sentalApiShared). It’s going to contain the interfaces that all the APIs will have to implement, and will also hold any base classes or utilities that the APIs are likely to find useful.

I’ve decomposed my requirements from this morning into a list of named components to tackle. Whether it’s the right list or not is too early to tell – there’s always more than one design that will work – but this is what I’m going with for now.

For the rest of this week’s blog posts, I’ll focus on creating the repustateApi component. Code at last 🙂 And that should just about (fingers crossed!) give me enough time at the weekend to fill in some of the other components before we pick things up again on Monday 🙂

Be the first to leave a comment »

In my Beyond Frameworks talk, I explained how a component-based architecture can help answer some of the important (i.e. expensive!) questions you might face when creating long-lived apps that rely on a PHP framework. In this series of blog posts, I’m going to look at how to go about creating and working with components.

Components don’t exist entirely in isolation. They’re built to solve a specific problem, and ultimately to get used inside some sort of application (or two). To help explain one way to approach designing and building components, blog post by blog post I’m going to create a real app and its components, showing you the workings involved and highlighting any decisions that were made along the way. All the code will appear on GitHub, so that you can tinker with it yourself too.

If you want to jump right in and figure it all out in your own way by taking existing code apart, phix and the now-separated-out ComponentManager are both built as components in their own right. Clone the git repos, look at the phing targets in the build.xml files, and have a play. Poke me on Twitter if you have questions. If/when you run into issues, please file them on GitHub, and I’ll get them fixed for you.

Hopefully you’re staying with me for the worked example app though 🙂 I’m going to start by defining what the final app needs to do.

App Requirements

I need an app that will consume a bunch of blog posts off the ‘net, and summarise for me how many are positive articles, and how many are negative articles. I’m going to use the data in future strategic planning sessions at work, to add another perspective where it will be useful. The app can be easily changed in future to analyse other data … I’m thinking of feeding it Flickr feeds too at some point, to give my photography some much-needed feedback.

Here’s how I’d summarise the technical requirements for this app:

  • The command-line portion of the app will do the analysis, running off a cron job.
  • The command-line portion will analyse a list of RSS feeds loaded from a .ini file.
  • The command-line portion will use a web-based API to score each blog article. The value of zero will be neutral; positive numbers mean the article is positive in tone, and negative numbers mean that the article is negative in tone.
  • The command-line portion will cache the results as XML files on disk (just to keep the example simple).
  • The website part of the app will present the analysis, consuming the summary data created by the cron job.
  • The app will be built as components, to make it easy to share code between both parts of the app.

That should give us enough variety to be able to get into some non-trivial examples.

With a very basic set of requirements documented, before I go near any code, my next step is to figure out a rough structure. I need to decompose my requirements into an architecture. That will be in the next blog post, which will along this evening.

Be the first to leave a comment »

I post photos to Flickr from time to time, and then write blog articles about the photos. The blog articles get written days, weeks, sometimes months in advance of when they’re scheduled to appear on my blog … which makes it a tad difficult to add a link from a photo to all of the blog articles that mention it.

So a couple of weekends ago I knocked up a very crude script that uses the Flickr API (via phpFlickr) to work through all of the published blog posts and make sure each of my Flickr photos has links back to each blog post that mention it. I’m posting it here in the public domain. Hopefully someone will find it a useful starting point to do something similar for their own blog.

[code lang=”php”]

<?php

require_once('phpflickr-3.0/phpFlickr.php');

$flickrApiKey = '’;
$flickrSecret = ”;
$flickrToken = ”;

$f = new phpFlickr($flickrApiKey, $flickrSecret);
$f->setToken($flickrToken);
$f->enableCache(‘fs’, ‘/tmp’, 3600);

// first step – find the first published blog post
$url = ‘http://blog.stuartherbert.com/photography/’;
$rawHtml = file_get_contents($url);
preg_match(‘/

/’, $rawHtml, $matches);

$blogPosts = array();
$flickrPhotos = array();

$latestPost = $matches[1];
$nextPost = $url . ‘?p=’ . $latestPost;

function updatePhotos($photoIndex, $flickrPhotos, $blogPosts, $f)
{
foreach ($photoIndex as $photoId => $flickrPhoto)
{
// we must rewrite the description
preg_match(‘|(.*)Copyright |s’, $flickrPhoto[‘description’], $matches);
if (isset($matches[1]))
{
$description = $matches[1];
}
else
{
$description = ”;
}
$description .= ‘Copyright (c) Stuart Herbert. Blog | Twitter | Facebook‘ . “n”
. ‘Photography: Merthyr Road | Daily Desktop Wallpaper | 25×9 | Twitter.’ . “nn”;

if (count($flickrPhoto[‘blogPosts’]) == 1)
{
$description .= ‘Want to know more about this photo? See this blog entry:’ . “nn”;
}
else
{
$description .= “Want to know more about this photo? See these blog entries:nn”;
}

foreach ($flickrPhoto[‘blogPosts’] as $postUrl => $blogPost)
{
$description .= ‘* ‘ . $blogPost[‘title’] . “n”;
}

// description is made … now to upload it
echo “Photo: ” . $photoId . ‘ :: ‘ . $flickrPhoto[‘title’] . “n”;
echo “URL : ” . $flickrPhoto[‘url’] . “n”;
echo “Old : ” . $flickrPhoto[‘description’] . “n”;
echo “New : ” . $description . “n”;

echo “nPushing changes to Flickr …”;
$f->photos_setMeta($photoId, $flickrPhoto[‘title’], $description);
echo ” donen”;
}
}

while ($nextPost !== null)
{
$photoIndex = array();

echo “Downloading $nextPost …”;
$rawHtml = file_get_contents($nextPost);
echo ” donen”;
if (!$rawHtml)
{
die(“Unable to download HTML for URL: ” . $nextPost . “n”);
}

preg_match(‘|

.*(.*)|Us’, $rawHtml, $matches);
$postUrl = $matches[2];
$title = $matches[3];
echo “Blog post title is: $titlen”;
echo “Blog post url is: $postUrln”;

preg_match(‘||’, $rawHtml, $matches);
if (isset($matches[1]))
{
$nextPost = $matches[1];
}
else
{
$nextPost = null;
}

preg_match(‘|

(.*)

|Us’, $rawHtml, $matches);
if (!isset($matches[1]))
die(“regex failed againn”);
$entryHtml = $matches[1];

preg_match_all(‘|(http://www.flickr.com/photos/stuartherbert/[0-9]+/)”|’, $entryHtml, $matches);
$blogPosts[$postUrl][‘url’] = $postUrl;
$blogPosts[$postUrl][‘title’] = $title;
$blogPosts[$postUrl][‘matches’] = $matches;

foreach ($matches[1] as $flickrPhoto)
{
$parts = explode(‘/’, $flickrPhoto);
$photoId = $parts[count($parts)-2];
$photoInfo = $f->photos_getInfo($photoId);

$flickrPhotos[$photoId][‘url’] = $flickrPhoto;
$flickrPhotos[$photoId][‘title’] = $photoInfo[‘title’];
$flickrPhotos[$photoId][‘description’] = $photoInfo[‘description’];
$flickrPhotos[$photoId][‘blogPosts’][$postUrl] = $blogPosts[$postUrl];

// note the photos we need to update because we have
// seen this post
$photoIndex[$photoId] = $flickrPhotos[$photoId];

echo “- Photo: ” . $photoInfo[“title”] . “n”;
}

updatePhotos($photoIndex, $flickrPhotos, $blogPosts, $f);
}

echo “nn”;
echo “Photo scraping complete!!nn”;

// when we get to here, we have photos to go and update on flickr
?>
[/code]

1 comment »

On Planet PHP, I see a lot of postings about PHP the language, and applications written in PHP, but not a lot about the runtime environment that PHP needs and can take advantage of. Is anyone interested in a series of posts about server architectures – not just LAMP vs WIMP vs WISP (Linux/Apache/MySQL/PHP, Windows/IIS/MySQL/PHP, and Windows/IIS/SQL Server/PHP) – but also about shared hosting vs dedicated hosting vs server farms vs active-active, active-passive clusters?

If you are interested, leave a comment on my blog to let me know. If there’s enough interest, I’ll write up a few articles about this and publish them on my blog. If there’s anything in particular you’d like to see covered, and explained in more detail, let me know!

(One of the things I do is design hosting solutions for our customers. Not the mega-solutions that folks like Rasmus gets to work on, but the smaller – and way more common – solutions needed for small to medium-size websites here in the UK).

22 comments »

I’ve recently switched my blog from b2evolution back to WordPress. The good news is both “no more spam :)” and “the admin panel works in Safari”, but on the downside I missed the multiblog feature that attracted me to b2evolution in the first place. There is WordPress MU, I suppose, but after coming across a few plugins that warned they didn’t work with WordPress MU, that option didn’t look very appealing.

Ah ha – thinks I – I can fake the multiblog by putting several different blogs on the site, and generating a homepage from the RSS feeds of the individual blogs. Should be simple enough, and it sounds like the perfect nail to hit with the SimpleXML hammer of PHP 5 🙂 Funnily enough, in work last week we were wondering whether you could use SimpleXML with XML namespaces (alas, we still use PHP 4 at work atm), so armed with the perfect excuse, I set to work.

Getting an RSS 2 feed into SimpleXML is trivial:

[code lang=”php”]
$feedUrl = ‘http://blog.stuartherbert.com/php/?feed=rss2’;
$rawFeed = file_get_contents($feedUrl);
$xml = new SimpleXmlElement($rawFeed);
[/code]

Extracting the information from the RSS ‘channel’ is equally trivial:

[code lang=”php”]
$channel[‘title’] = $xml->channel->title;
$channel[‘link’] = $xml->channel->link;
[/code]

… and so on. Getting at the individual articles starts off just as easy:

[code lang=”php”]
foreach ($xml->channel->item as $item)
{
$article = array();
$article[‘title’] = $item->title;
$article[‘link’] = $item->link;
}
[/code]

… but, if you’re relying on the very thin SimpleXML documentation on php.net, like me you’ll soon run into two problems.

Some of the elements in an item sit inside different XML namespaces. The only way to get at them is to use the children() method on a SimpleXMLElement:

[code lang=”php”]
$dc = $item->children(‘http://purl.org/dc/elements/1.1/’);
$article[‘creator’] = $dc->creator;
foreach ($dc->subject as $subject)
$article[‘subject’][] = $dc->subject;
[/code]

That’s a bit of a mouthful. It’s a bit of a shame that I can’t do this:

[code lang=”php”]
// The following does NOT work!
$article[‘creator’] = $article->dc->creator;
[/code]

… or some variation on that, but the design of XML namespaces makes that impractical. (The XML namespace is actually the URI; the ‘dc’ prefix in a tag like <dc:creator> is shorthand defined in the opening tag at the top of the XML document. Although it would look a bit odd, there’s nothing at all to stop someone defining the ‘dc’ component as ‘dublinCore’ instead if they wanted to).

Having to pass the full URI for a namespace into children() is not my idea of fun! It’d be much better if we could pass in a shorter string instead. The only way to safely do this is to define an array of shortcuts yourself:

[code lang=”php”]
// define the namespaces that we are interested in
$ns = array
(
‘content’ => ‘http://purl.org/rss/1.0/modules/content/’,
‘wfw’ => ‘http://wellformedweb.org/CommentAPI/’,
‘dc’ => ‘http://purl.org/dc/elements/1.1/’
);

// now we can get dublin core content with a lot less typing!
// we also only have to update the code in one place if the namespace URI changes
$dc = $item->children($ns[‘dc’]);
$article[‘creator’] = $dc->creator;
[/code]

You can get a list of the namespaces like this:

[code lang=”php”]
$ns = $xml->getNamespaces(true);
$dc = $item->children($ns[‘dc’]);
[/code]

… but that only works if the XML document defines the prefix ‘dc’ for the namespace ‘http://purl.org/dc/elements/1.1/’. You’ll have to decide for yourself whether it’s a risk worth taking or not.

That’s namespaces tamed, but we’re not quite home yet. The actual ‘content’ part of the article sits inside a CDATA section inside a ‘content’ namespace, and how to deal with CDATA is conspicuous by its absence in the SimpleXML docs (probably because older versions of SimpleXML simply threw CDATA sections away without asking you).

If you have a look at the source code for SimpleXML, test 004 shows how basic CDATA access works.

[code lang=”php”]
$content = $item->children($ns[‘content’]);
$article[‘content’] = (string) trim($content->encoded);
[/code]

With that, the final code to read a RSS 2 feed looks like this:

[code lang=”php”]
// define the namespaces that we are interested in
$ns = array
(
‘content’ => ‘http://purl.org/rss/1.0/modules/content/’,
‘wfw’ => ‘http://wellformedweb.org/CommentAPI/’,
‘dc’ => ‘http://purl.org/dc/elements/1.1/’
);

// obtain the articles in the feeds, and construct an array of articles

$articles = array();

// step 1: get the feed
$blog_url = ‘http://blog.stuartherbert.com/php/?feed=rss2’;

$rawFeed = file_get_contents($blog_url);
$xml = new SimpleXmlElement($rawFeed);

// step 2: extract the channel metadata

$channel = array();
$channel[‘title’] = $xml->channel->title;
$channel[‘link’] = $xml->channel->link;
$channel[‘description’] = $xml->channel->description;
$channel[‘pubDate’] = $xml->pubDate;
$channel[‘timestamp’] = strtotime($xml->pubDate);
$channel[‘generator’] = $xml->generator;
$channel[‘language’] = $xml->language;

// step 3: extract the articles

foreach ($xml->channel->item as $item)
{
$article = array();
$article[‘channel’] = $blog;
$article[‘title’] = $item->title;
$article[‘link’] = $item->link;
$article[‘comments’] = $item->comments;
$article[‘pubDate’] = $item->pubDate;
$article[‘timestamp’] = strtotime($item->pubDate);
$article[‘description’] = (string) trim($item->description);
$article[‘isPermaLink’] = $item->guid[‘isPermaLink’];

// get data held in namespaces
$content = $item->children($ns[‘content’]);
$dc = $item->children($ns[‘dc’]);
$wfw = $item->children($ns[‘wfw’]);

$article[‘creator’] = (string) $dc->creator;
foreach ($dc->subject as $subject)
$article[‘subject’][] = (string)$subject;

$article[‘content’] = (string)trim($content->encoded);
$article[‘commentRss’] = $wfw->commentRss;

// add this article to the list
$articles[$article[‘timestamp’]] = $article;
}

// at this point, $channel contains all the metadata about the RSS feed,
// and $articles contains an array of articles for us to repurpose
[/code]

Don’t forget to add error handling 🙂

I hope this example helps anyone else who needs to work with RSS 2 feeds, or who needs to know how to work with namespaces and CDATA with SimpleXML.

71 comments »
Page 3 of 3123