I’ve given several talks to user groups over the last few months, around the topics of testing and performance. I hope that you find them interesting and useful.

When To Mock

Mocking has not only become a mainstream practice in recent years, it has also become the default practice for many developers. In this talk, I present three questions that you should ask yourself first whenever you’re deciding if a mock is the right way to test your code, and some insight into how over-mocking ends up making your tests no better than having no tests at all.

This talk was presented to PHP Dorset in November, 2014.

How Much Does Your ORM Cost?

Whenever you use an ORM, you’re trying to reduce the time it takes to ship your code. But what are you trading in return? This lightning talk introduces the idea that ORMs aren’t entirely “free”, and hopes to start you thinking about how you might go about measuring just how much that ORM is really costing you.

This talk was presented to PHP South West (where I was once the answer to a pub quiz question – my claim to fame!) in April 2015.

The Mad Science of Testing

I’m a firm believer that good testing is also good science, and in this talk, I hope to convince you too. And if we’re going to be scientists, we might as well be mad scientists, because they have all the fun 🙂

This talk was presented to PHP Berkshire in June 2015.

Be the first to leave a comment »

I haven’t really talked much about my upcoming tutorial session at #phpnw12 next month before now, but I hope there’s still time to convince you to come along and learn how to use Git as your team grows in size.

That’s what I’m teaching: a strategy, plus supporting tools for Git called HubFlow, that will help you stay sane – and more importantly help you keep delivering – as your team starts to collaborate on your product.

It isn’t my strategy: the credit must go to Vincent Driessen, who first blogged about GitFlow at the start of 2010. And they aren’t my tools: again, they originally come from Vincent. All I’ve done is adapt them for working against GitHub – hence the name “HubFlow”; but if BitBucket is more to your taste (or wallet) then rest assured that both tools and strategy work can be adapted for there too.

Maybe you don’t need this strategy. If you’re working on one-off consulting gigs for clients, where you can get in quick and get out quick, then HubFlow might not be necessary. But if you’re working on any sort of product or service, either commercially or open-source, then I can strongly recommend HubFlow to you – even if you’re just at the one-man startup stage. And the benefits of adopting HubFlow only increase as your team grows in size.

I’ve already put a lot of effort into documenting HubFlow, and if you’ve previously read the docs, you might be feeling confident enough to adopt it by yourself. If you do, I think that’s awesome, and you should go for it without delay. Please do email me if I can help in any way. I’m passionate about everyone adopting the fundamentals of software engineering, and few things are more fundamental than good source control.

But if you’re still reading at this point, I hope that I can convince you to buy one of the remaining tickets for my tutorial session at #phpnw12. I don’t personally profit by it – all of us teaching at #phpnw12 are volunteers – but maybe you will.

What I’m teaching is the approach that I’ve introduced to DataSift. You might not have heard of us yet; we provide a platform for filtering social data in real time, handling terabytes of data a day at full firehose-scale, and many thousands of incoming data every second. And every piece of that data goes through code written in PHP.

Although we’re a young startup, we’ve already grown beyond 20 developers, and with that many people collaborating to build that platform, with every team working at their own pace, we needed to adopt a common way of working together with Git and GitHub so that the company continued to scale well.

  • It had to be a way that allowed every developer to take full advantage of Git, especially when it comes to committing their work early and often.
  • It had to allow developers to form ad-hoc teams that worked at their own pace.
  • Remote working is a fact of life these days, and it had to work just as well whether everyone is in the office or working from somewhere else.
  • It also had to ensure that only work that had been finished made its way into any of our releases.
  • We wanted to make sure that there was an opportunity to review every change before it went into a release. Code reviews play an important part in delivering high-quality work time after time after time.
  • We didn’t want pending releases to hold up new development, ever.
  • And if something did screw up in production, we needed a way to go back to our last known good version, fix it, and release *that* – all without disrupting any pending releases or existing ad-hoc teams.
  • Finally, it had to be easy to teach to people who are new to Git and GitHub, preferably by wrapping complicated Git operations up inside a single command each time.

Those are the benefits that HubFlow gives us. And at #phpnw12, I’ll be teaching everyone who attends my tutorial session how to get those benefits too.

What makes me qualified to teach this topic? And what makes me qualified to be teaching at all?

I’ve got 18 years of experience setting up and/or running software configuration management, taking in systems as diverse as RCS, CVS, Perforce, Continuus, Clearcase, Subversion, Mercurial, and Git. I’ve done this with, and for, organisations as small as a one-man team all the way through to large international corporates. I’ve even built a version control system for one company in the past. (Git is much better! 🙂 And I was around long enough to see (and learn from) the failures as well as the successes. (I’m a big believer that success teaches you a bit, but failure teaches you more). Plus, I’m the author of the HubFlow strategy, and the maintainer of the HubFlow extension for Git.

It simply isn’t possible for me to distill all of that rich and lengthy experience down into the documentation that I’ve written for HubFlow. I think the documentation is good – I wouldn’t have put my name to it otherwise – but I think you can learn even more from me in person.

I’m a qualified teacher of adults. I’m trained how to teach, and I’ve had a lot of practice doing so. My first PHP conference appearance was back in 2004 on Marco’s php|cruise, and since then I’ve spoken at the PHP NorthWest and PHP UK conferences several times. I co-wrote the Zend Certification Study Guide for PHP4. Once a year, I teach at the University of Aberystwyth, helping their Comp Sci students prepare for applying for jobs for their year in industry. Plus I’ve done a substantial amount of teaching and mentoring as part of my job and open-source work over many years. And away from computers, I’ve been teaching martial arts for over 12 years.

If you need the benefits that HubFlow brings, then I’d love to teach you in person. You can buy a ticket for my tutorial day on the #phpnw12 website. I hope to see you there.

Be the first to leave a comment »

At work, we’ve recently published HubFlow: instructions + tools for using Vincent Dreissen’s GitFlow model with GitHub.

The full article is over on the DataSift dev blog.


So that I don’t forget how to do this next time around. Worked for me, your mileage may vary.

First step is to get a working install of PHP.

  1. Download PHP 5.4.latest ZIP file from the PHP Windows website
  2. Unpack the ZIP file into c:php. You should end up with c:phpphp.exe
  3. Copy c:phpphp.ini-development to be c:phpphp.ini
  4. Edit c:phpphp.ini to suit (e.g. set date.timezone)
  5. Make sure you add c:php to your system PATH (via Computer’s Advanced Properties -> Environment Variables)
  6. Reboot (this is Windows, after all 🙂

At this point, you should be able to open up a Command Prompt, and type ‘php -v’, and see the response ‘PHP v5.4.latest …’ appear as expected.

Now for PEAR itself.

  1. Open http://pear.php.net/go-pear.phar in a browser, save this file into c:php
  2. In a Command Prompt, cd to c:php and then run “php c:phpgo-pear.phar”
  3. At the prompt, select ‘system’. A text menu of paths will appear
  4. Fix the default path for pear.ini (option 11) to be c:phppear.ini
  5. Fix the default folder to look inside for php.exe to be c:php
  6. Make sure the binaries folder (option 4) is c:php
  7. Check all of the other options, make sure they are prefixed with c:php
  8. Press ENTER, and you should see PEAR downloading various PEAR packages onto your system
  9. Double-click the PEAR_ENV.reg file in c:php
  10. Reboot again to make sure PEAR_ENV registry entries have taken effect

At this point, PEAR is installed and should be available to use in your own projects, or with something like Phix.

Personal Notes

Some reminders to myself for the next time I have to do this.

  • Documentation for PHP for Windows and PEAR for Windows both seem to be out of step with current downloads. There’s currently no Windows installer for PHP available, and the PHP .ZIP file doesn’t contain the ‘go-pear.bat’ file.
  • You have to pay close attention to the default folders offered when running ‘go-pear.phar’. They appear to use the current working directory as the prefix even when installing system-wide, except for the location of pear.ini and php.exe – neither of these defaults are sane, and must be manually changed during the install 🙁
  • After install, pear command doesn’t seem to be 100% compatible with its behaviour on Linux and OS X. -D switch didn’t work, there may be other problems too that I haven’t yet found.
  • Both reboots are required – I’m not taking the piss there – for all running Windows apps to pick up the changes.

One of the questions I’ve been asked after yesterday’s blog post about Phix’s ContractLib is why not just use PHP’s built-in assert() function? I think it’s a great question, and the best way to answer it is to take a look at the key differences between two solutions.

Side By Side Comparison

Feature assert() ContractLib
Implementation PHP extension written in C (ships as standard part of PHP) PHP library written in PHP
Enable / disable execution Partial (there is an overhead when disabled, but it’s low) Partial (there is an overhead when disabled, but it’s higher)
Issues PHP4-style warning when tests fail Yes (configurable) No (throws a ContractFailedException instead)
Terminate PHP script when tests fail Yes (configurable) Only if the ContractFailedException is never caught
Quiet eval of test expression Yes (configurable) No (not required; test expressions are pure PHP code, not eval() strings)
Callback on failed test Yes (configurable) No (unwinds the stack instead by throwing ContractFailedException)
Throws Exception when tests fail No (but can emulate if you write your own assert() callback method) Yes (standard behaviour)
Tests are pure PHP code No – recommended way is to pass strings into assert() to be eval()’d Yes
Error report includes original value that failed the test No Yes
Support for per-test custom failure messages No Yes – are required to provide one
Support for Old Value storage and recall No (but can emulate by hand) Yes

The Differences Explained

The key difference is one of philosophy. assert() sits well with the PHP4 style of error reporting and handling, whereas ContractLib is firmly in favour of the OO world’s approach to reporting errors.

It’s a personal preference, but I think that PHP4-style errors have no place in code that has any desire to be robust. Exceptions aren’t perfect, don’t get me wrong, but their core property of unwinding the call stack in an orderly fashion makes writing robust code much easier. And they also carry a payload – information about what went wrong and why – which PHP’s assert() cannot provide to the same extent.

It’s much quicker to debug something when there’s a record of the value that failed the test. For that reason alone, I’d always prefer something like ContractLib over the built-in assert() approach.

But we can’t ignore the fact that these are tests that get shipped to, and executed in, the production environment. Unlike unit tests, adopting programming by contract will slow down your PHP code in production. The question is: by how much?

What About The Performance?

I’ve done some benchmarking between the two, using the five tests listed in the final example in yesterday’s blog post. It’s a real-world example of the kind of tests that I would look to add to code to improve robustness.

Here are the results I gathered, calling the tests 10,000 times in a tight loop. The tests were run from the command line, and the times do include PHP start-up / shutdown time and the time taken to parse each test file. I assumed a best-case scenario, where the tests would always pass.

Test Approach Time w/ Tests Disabled Time w/ Tests Enabled
Tests written using assert() 1.103s (100%) 5.989s (543%)
Tests written using ContractLib 3.055s (277%) 3.096s (281%)

When tests are disabled, using assert() is much cheaper than using ContractLib today. That’s to be expected, as assert() is written in C. I imagine that we could get close to the same performance if ContractLib was rewritten in C as a PHP extension.

But, when tests are enabled, assert() is much slower than ContractLib. Why? Because the recommended way to use assert() is to pass the test in as a string. PHP has to convert that string into bytecode to execute, and that conversion appears to be quite expensive.

Given the choice, I’d rather trade things running a little slower in production for having much faster tests when I’m writing code, and that’s why I created ContractLib. Plus I get much better information to understand why the test failed, and if I wanted to run the tests in production, I can handle their failures in a much saner way.

Final Words

In my experience, the time it takes to develop and ship code is normally more critical than how fast the code runs in production. Developer time has become a scarcer resource than CPU time.

Used intelligently, these kinds of tests in your code can help your team deliver quicker, because the code they are using and reusing is more robust first time around. Programming by contract is different to, and complements, unit testing because contract tests catch errors in using the code.

Whether you use ContractLib, assert(), or you create your own solution, you should really consider how much it is costing you when you don’t use these kinds of tests.

Be the first to leave a comment »

In my last blog post, I introduced ContractLib, a simple programming by contract library that I’ve created for PHP 5.3 onwards. And I promised some examples 🙂

Installing ContractLib

ContractLib is available from the Phix project’s PEAR channel. Installing it is as easy as:

[code lang=”bash”]
$ pear channel-discover pear.phix-project.org
$ pear install -a phix/ContractLib

At the time of writing, this will install ContractLib-2.1.0. We use semantic versioning, so these examples will continue to work with all future releases of ContractLib-2.x.

Adding ContractLib To Your Project

Assuming you’re using a PSR-0 compatible autoloader, just import the Contract class into your PHP file:

[sourcecode language=”php”]
use Phix_ProjectContractLibContract;

Adding A Pre-condition Contract To Your Method Or Function

Take a trivial method like this:

[code lang=”php”]
class ActionToApply
public function appendNow($params)
$params[] = time();

This method works fine … until someone passes a non-array as the parameter. At that point, your code stops working – not because your code is wrong, but because someone used it in the wrong way. This is a classic cause of buggy PHP apps. Thankfully, it’s very easy to address using ContractLib.

If we were certain that the $params parameter was always an array, then we can keep the method itself extremely simple and clean. We can ensure that by adding a pre-condition using ContractLib.

[code lang=”php”]
use Phix_ProjectContractLibContract;

class ActionToApply
public function appendNow($params)
Contract::Preconditions(function() use ($params)
‘$params must be an array’

// original method code continues here
$params[] = time();

Now, if someone passes in a non-array, the caller will automatically get an E5xx_ContractFailedException, which makes it clear that the fault is in the caller’s code … not your’s.

PHP 5.4’s upcoming support for better type-hinting is another way to catch this kind of error, but not only does ContractLib work today with PHP 5.3 (which means you don’t have to wait to migrate to PHP 5.4), but also that you can write tests for anything, not just the checking that’s built into PHP.

This means you can make your code much more robust, by tightening up on the quality of the parameter passed into your code by other programmers. To extend our example, we might decide that an empty array is also unacceptable:

[code lang=”php”]
use Phix_ProjectContractLibContract;

class ActionToApply
public function appendNow($params)
Contract::Preconditions(function() use ($params)
‘$params must be an array’
count($params) > 0,
‘$params cannot be an empty array’

// original method code continues here
$params[] = time();

The point here is that we can go way beyond type-hinting checks (important as they are) and look inside parameters to make sure they are suitable.

Here’s a real example from Phix’s CommandLineLib:

[code lang=”php”]
use Phix_ProjectContractLibContract;

class CommandLineParser
// …

public function parseSwitches($args, $argIndex, DefinedSwitches $expectedOptions)
// catch programming errors
Contract::Preconditions(function() use ($args, $argIndex, $expectedOptions)
‘$args must be array’
count($args) > 0,
‘$args cannot be an empty array’

‘$argIndex must be an integer’
count($args) >= $argIndex,
‘$argIndex cannot be more than +1 beyond the end of $args’

count($expectedOptions->getSwitches()) > 0,
‘$expectedOptions must have some switches defined’

// method’s code follows on here …

In this real-life code, we start off by checking for basic errors first (by making sure we’re looking at the right type for each parameter), and then we follow up with more specific tests, that ensure that we have data that we’re happy to work with. We’ve done these tests at the start of the method, so that it isn’t cluttered with error checking, which makes our code much cleaner that it might otherwise be. And, because all the tests are in one really easy to spot block, anyone reading your code can immediately see what they have to do to meet the contract you’ve created.

Because these tests are just plain-old PHP code, and don’t rely on annotations or any other such nonsense, the contracts you create and enforce are limited only by your choices.

But Aren’t All Those Tests Slow?

They are. PHP’s getting better and better at this, but function/method calls have always been painfully slow in PHP. I’m afraid that if you want robust code, you can’t have it for free. (You can in C, but that’s a topic to discuss over a decent whiskey at a conference).

I’ve done key two things with ContractLib to keep the runtime cost down:

  1. Contract::Preconditions() accepts a lambda function as its parameter. Your contract’s tests go inside this lambda function, and Contract::Preconditions() only calls the lambda function if contracts are enabled.
  2. By default, ContractLib does not enable contracts. You have to choose to do so by calling Contract::EnforceWrappedContracts().

This keeps the overhead down to just one method call (to Contract::Preconditions()) when contracts are not enabled. It isn’t as good as having no overhead, but it’s cheaper than the developer time lost trying to track down bugs in code that always assumes the caller can be trusted to do the right thing every time.

Any Questions?

I hope these examples have given you an idea on how to get started with ContractLib. If you have any questions or suggestions, please let me know, and I’ll do my best to answer them.

Be the first to leave a comment »

Introducing ContractLib

Posted by Stuart Herbert on January 11th, 2012 in 2 - Intermediate.

ContractLib is a simple-to-use PHP component for easily enforcing programming contracts throughout your PHP components. These programming contracts can go a long way to helping you, and the users of your components, develop more robust code.

ContractLib is loosely inspired by Microsoft Research’s work on the Code Contracts Library for .NET.

What Are Programming Contracts?

Programming contracts are tests around functions and methods, and they are normally used:

  1. to catch any ‘bad’ data that has been passed into the function or method from the caller, and
  2. to catch any ‘bad’ data generated by the function or method before it can be returned to the caller

These are pre-condition and post-condition tests, and they are tests that either pass or fail.

Why Have Programming Contracts?

Two reasons: code robustness and time saved.

Programming contracts catch errors early, and (unlike unit tests) they don’t just catch your errors, they catch errors made by programmers who reuse your code.

  • Catching errors early

    There is a class of bugs best described as garbage in, garbage out. The “garbage in” is data that is of the wrong type, or out of range, or missing (think empty arrays, empty strings, nulls). Often, the garbage being fed in is also garbage that has come out of a buggy function or method.

    Simple pre-condition checks at the start of your functions and methods quickly catches garbage data before it can propagate through your code. The more functions and methods contain pre-condition checks, the easier it becomes to catch garbage data closer to where it is being created. This allows you to spend less time tracking down the original source of a bug, and more time writing new code.

    These pre-conditions also greatly increase the chances of bugs in your code being caught in development, especially when combined with a healthy amount of unit testing.

    You can also add post-conditions at the end of your functions and methods, to make sure that you’re never returning any garbage back out of your function or method. There’s a lot of overlap between post-conditions and unit tests; the main difference is that your post-conditions will run 100% of the time, whereas your unit tests will only run when you run them and against the (often extremely limited) data you use in your unit tests.

  • Catching errors when code is reused

    Unit tests are great, and a very important part of creating high-quality code. But they’re your tests. They’re written to prove that your code does what you think it does.

    Unit tests don’t prove that someone else is reusing your code the way you meant them to.

    And neither do integration tests, because if someone is reusing your code, integration tests are their tests. Integration tests are tests to prove that they have glued their code on top of your code in a way that they are happy with.

    Simple pre-condition checks at the start of your functions and methods are your best opportunity to test how someone else is reusing your code, and to tell them if they’re getting it wrong.

Programming contracts are about building trust (just like unit tests). Code that you can trust is normally code that is quicker to work with. They’re really quick to write (normally far quicker than unit tests), and they can make it really quick to track down the origin of bugs in your code.

Don’t Programming Contracts Make Code Stupidly Strict?

An appropriate amount of strictness is a requirement of all high-quality code. The trick is knowing what to be strict about. Not strict enough, and you let in shitty data that causes your code to fail or be insecure. Too strict, and people will think that your code is too much trouble to work with.

As a general rule, pre-conditions should check for:

  • data that’s in an incorrect format
  • data that’s out of range
  • data that’s missing

Post-conditions should check the same things. They can also be used to check for data that should have been changed, but hasn’t been changed.

Aren’t Programming Contracts Too Old-Fashioned For PHP?

The concept has been around for decades. As a C programmer, I first learned about programming contracts in the early 90’s, when I was writing code that had to run for months at a time with zero downtime. We were debugging and improving code dating back from the 1980’s, and introducing programming contracts played an important role in getting to the bottom of many of the bugs that users reported.

PHP code (and other modern languages like Java, Ruby, Scala etc) is fundamentally similar to older languages like C, although you may not realise that this is the case. It’s the same fundamental paradigm – data is passed into blocks of software, and blocks of software may also pass data out too.

The advantage we have with PHP is that our programming contracts don’t have to be as lengthy as they would for a C program, because PHP itself can enforce type checks through type hinting, and we don’t have to worry about low-level details like proper handling of null-terminated strings.


You can take a look at ContractLib’s unit tests on GitHub.

I’ll post some detailed examples in my next blog post.

Be the first to leave a comment »

If you’re building a web-based app, it’s always a good idea to build some instrumentation into your app. That way, you can see how your app is behaving, and how your users are interacting with your app over time.

I’m sure everyone who reads my blog is familiar with Google Analytics for tracking page hits. But what about what’s happening inside your app? Right now? Do you know?

Graphite is one way to graph the stats that you add to your app. Combine it with (say) statsd from Etsy, and adding any stats you want is easy. (Read this blog post from Etsy if you want to learn more about measuring your app, and how to add support for Statsd to your app).

Normally, you’ll probably be interested in looking at graphs that show your stats over a period of hours or days (for trend analysis), and both Graphite and Statsd are sensibly tuned for that. But what if you want to see what’s happening now, in real time? I couldn’t find any clear instructions on how to do that elsewhere, so here’s my take on how to do it. I’m assuming you’re already familiar with installing both Statsd and Graphite, and that you’ve already had both up and running successfully with their default configurations.

Making Statsd Forward Data In Real Time

By default, Etsy’s Statsd collects the data sent from your apps, and forwards it on to Graphite’s data collector (known as Carbon) every 10 seconds. For real time, we need the data forwarded on every second. To do that, edit your Statsd config, adding the ‘flushInterval’ value:

  graphitePort: 2003
, graphiteHost: "localhost"
, port: 8123
, flushInterval: 1000

A value of 1000 tells Statsd to forward data on every second.

Making Graphite Store Data At One Second Resolution

Graphite’s default / sample configuration tells it to store incoming data at 60-second resolution; that allows us to look at the total stats recorded minute by minute, but we can’t drill down to see what happens second by second. To do that, we have to tell Graphite to store the data on a second-by-second basis.

Edit /opt/graphite/conf/storage-schemas.conf, and add the following clause:

priority = 200
pattern = ^stats.*
retentions = 1:34560000

This tells Graphite that we want all the data received from Statsd to be kept on a second-by-second basis for 400 days … plenty long enough for any sort of comparison you might need to do.

I found that, to get Graphite to start using this new storage definition, I had to go and delete the existing data files by doing:

$ rm -rf /opt/graphite/storage/whisper/stats*

Getting Graphite To Draw Real-Time Graphs

Now all we need to do is to get Graphite showing you all the collected data in real-time. By default, Graphite will happily plot the data onto a graph, but will only generate an updated graph every 60 seconds. That’s perfect for an ops team looking for trends over hours, but it isn’t real-time.

If you’re using Memcache with Graphite, you’ll need to add this to your /opt/graphite/webapp/graphite/local_settings.py file, to tell Graphite to only cache data in Memcache for 1 second:


Is it worth caching the data at all at this resolution? Honestly, I don’t know. I guess that depends on how many people need to watch the data in real-time or not. Ideally, it would be better if Graphite dynamically set the Memcache timeout based on the data stored in the particular key, but for now, you need to either stop using Memcache, or set the cache duration to 1 second.

This now gives you graphs with 1-second resolution … now we just need to change the Graphite web-app’s auto-refresh feature to load a new graph every second. By default, it will only generate an updated graph every 60 seconds. To change this, we have to edit some of the app’s Javascript code.

  • Open the file /opt/graphite/webapp/content/js/composer_widgets.js, and locate the function ‘toggleAutoRefresh’.
  • Locate the ‘interval’ variable inside that function. Change its value from ’60’ to ‘1’.
  • Save the file, then refresh your web browser page.

Et voila. If you switch on auto refresh, you should now be able to see your app’s data being plotted second by second, giving you a real-time view of what your app is doing.

Be the first to leave a comment »

At #phpsw this week, former Gradwell-er and all-round good guy Ade Slade delivered a great talk about DbUnit. Testing in general, and doing testing the right way in particular, is one of his great passions as a developer, and he certainly brought a lot of enthusiasm and hands-on experience to an often-neglected part of unit testing one’s code.

The talk and slides are on Ade’s blog.

Be the first to leave a comment »

I post photos to Flickr from time to time, and then write blog articles about the photos. The blog articles get written days, weeks, sometimes months in advance of when they’re scheduled to appear on my blog … which makes it a tad difficult to add a link from a photo to all of the blog articles that mention it.

So a couple of weekends ago I knocked up a very crude script that uses the Flickr API (via phpFlickr) to work through all of the published blog posts and make sure each of my Flickr photos has links back to each blog post that mention it. I’m posting it here in the public domain. Hopefully someone will find it a useful starting point to do something similar for their own blog.

[code lang=”php”]



$flickrApiKey = '’;
$flickrSecret = ”;
$flickrToken = ”;

$f = new phpFlickr($flickrApiKey, $flickrSecret);
$f->enableCache(‘fs’, ‘/tmp’, 3600);

// first step – find the first published blog post
$url = ‘http://blog.stuartherbert.com/photography/’;
$rawHtml = file_get_contents($url);

/’, $rawHtml, $matches);

$blogPosts = array();
$flickrPhotos = array();

$latestPost = $matches[1];
$nextPost = $url . ‘?p=’ . $latestPost;

function updatePhotos($photoIndex, $flickrPhotos, $blogPosts, $f)
foreach ($photoIndex as $photoId => $flickrPhoto)
// we must rewrite the description
preg_match(‘|(.*)Copyright |s’, $flickrPhoto[‘description’], $matches);
if (isset($matches[1]))
$description = $matches[1];
$description = ”;
$description .= ‘Copyright (c) Stuart Herbert. Blog | Twitter | Facebook‘ . “n”
. ‘Photography: Merthyr Road | Daily Desktop Wallpaper | 25×9 | Twitter.’ . “nn”;

if (count($flickrPhoto[‘blogPosts’]) == 1)
$description .= ‘Want to know more about this photo? See this blog entry:’ . “nn”;
$description .= “Want to know more about this photo? See these blog entries:nn”;

foreach ($flickrPhoto[‘blogPosts’] as $postUrl => $blogPost)
$description .= ‘* ‘ . $blogPost[‘title’] . “n”;

// description is made … now to upload it
echo “Photo: ” . $photoId . ‘ :: ‘ . $flickrPhoto[‘title’] . “n”;
echo “URL : ” . $flickrPhoto[‘url’] . “n”;
echo “Old : ” . $flickrPhoto[‘description’] . “n”;
echo “New : ” . $description . “n”;

echo “nPushing changes to Flickr …”;
$f->photos_setMeta($photoId, $flickrPhoto[‘title’], $description);
echo ” donen”;

while ($nextPost !== null)
$photoIndex = array();

echo “Downloading $nextPost …”;
$rawHtml = file_get_contents($nextPost);
echo ” donen”;
if (!$rawHtml)
die(“Unable to download HTML for URL: ” . $nextPost . “n”);


.*(.*)|Us’, $rawHtml, $matches);
$postUrl = $matches[2];
$title = $matches[3];
echo “Blog post title is: $titlen”;
echo “Blog post url is: $postUrln”;

preg_match(‘||’, $rawHtml, $matches);
if (isset($matches[1]))
$nextPost = $matches[1];
$nextPost = null;



|Us’, $rawHtml, $matches);
if (!isset($matches[1]))
die(“regex failed againn”);
$entryHtml = $matches[1];

preg_match_all(‘|(http://www.flickr.com/photos/stuartherbert/[0-9]+/)”|’, $entryHtml, $matches);
$blogPosts[$postUrl][‘url’] = $postUrl;
$blogPosts[$postUrl][‘title’] = $title;
$blogPosts[$postUrl][‘matches’] = $matches;

foreach ($matches[1] as $flickrPhoto)
$parts = explode(‘/’, $flickrPhoto);
$photoId = $parts[count($parts)-2];
$photoInfo = $f->photos_getInfo($photoId);

$flickrPhotos[$photoId][‘url’] = $flickrPhoto;
$flickrPhotos[$photoId][‘title’] = $photoInfo[‘title’];
$flickrPhotos[$photoId][‘description’] = $photoInfo[‘description’];
$flickrPhotos[$photoId][‘blogPosts’][$postUrl] = $blogPosts[$postUrl];

// note the photos we need to update because we have
// seen this post
$photoIndex[$photoId] = $flickrPhotos[$photoId];

echo “- Photo: ” . $photoInfo[“title”] . “n”;

updatePhotos($photoIndex, $flickrPhotos, $blogPosts, $f);

echo “nn”;
echo “Photo scraping complete!!nn”;

// when we get to here, we have photos to go and update on flickr

1 comment »
Page 1 of 212