I post photos to Flickr from time to time, and then write blog articles about the photos. The blog articles get written days, weeks, sometimes months in advance of when they’re scheduled to appear on my blog … which makes it a tad difficult to add a link from a photo to all of the blog articles that mention it.

So a couple of weekends ago I knocked up a very crude script that uses the Flickr API (via phpFlickr) to work through all of the published blog posts and make sure each of my Flickr photos has links back to each blog post that mention it. I’m posting it here in the public domain. Hopefully someone will find it a useful starting point to do something similar for their own blog.


<?php

require_once('phpflickr-3.0/phpFlickr.php');

$flickrApiKey = '<your Flickr API key>';
$flickrSecret = '<your Flickr API secret>';
$flickrToken  = '<your Flickr auth token>';

$f = new phpFlickr($flickrApiKey, $flickrSecret);
$f->setToken($flickrToken);
$f->enableCache('fs', '/tmp', 3600);

// first step - find the first published blog post
$url = 'http://blog.stuartherbert.com/photography/';
$rawHtml = file_get_contents($url);
preg_match('/<h2 id="post-([0-9]+)">/', $rawHtml, $matches);

$blogPosts = array();
$flickrPhotos = array();

$latestPost = $matches[1];
$nextPost = $url . '?p=' . $latestPost;

function updatePhotos($photoIndex, $flickrPhotos, $blogPosts, $f)
{
	foreach ($photoIndex as $photoId => $flickrPhoto)
	{
		// we must rewrite the description
		preg_match('|(.*)Copyright |s', $flickrPhoto['description'], $matches);
		if (isset($matches[1]))
		{
			$description = $matches[1];
		}
		else
		{
			$description = '';
		}
		$description .= 'Copyright (c) Stuart Herbert. <a href="http://blog.stuartherbert.com/photography/" rel="nofollow">Blog</a> | <a href="http://twitter.com/stuherbert" rel="nofollow">Twitter</a> | <a href="http://www.facebook.com/stuartherbert" rel="nofollow">Facebook</a>' . "\n"
		     	. 'Photography: <a href="http://blog.stuartherbert.com/photography/merthyr-road" rel="nofollow">Merthyr Road</a> | <a href="http://blog.stuartherbert.com/photography/daily-desktop-wallpaper" rel="nofollow">Daily Desktop Wallpaper</a> | <a href="http://blog.stuartherbert.com/photography/project-25x9" rel="nofollow">25x9</a> | <a href="http://twitter.com/stuphotos" rel="nofollow">Twitter</a>.' . "\n\n";
	
		if (count($flickrPhoto['blogPosts']) == 1)
		{
			$description .= 'Want to know more about this photo? See this blog entry:' . "\n\n";
		}
		else
		{
			$description .= "Want to know more about this photo? See these blog entries:\n\n";
		}
	
		foreach ($flickrPhoto['blogPosts'] as $postUrl => $blogPost)
		{
			$description .= '* <a href="' . $postUrl . '">' . $blogPost['title'] . "</a>\n";
		}
	
		// description is made ... now to upload it
		echo "Photo: " . $photoId . ' :: ' . $flickrPhoto['title'] . "\n";
		echo "URL  : " . $flickrPhoto['url'] . "\n";
		echo "Old  : " . $flickrPhoto['description'] . "\n";
		echo "New  : " . $description . "\n";

		echo "\nPushing changes to Flickr ...";
		$f->photos_setMeta($photoId, $flickrPhoto['title'], $description);
		echo " done\n";
	}
}

while ($nextPost !== null)
{
	$photoIndex = array();

	echo "Downloading $nextPost ...";
	$rawHtml = file_get_contents($nextPost);
	echo " done\n";
	if (!$rawHtml)
	{
		die("Unable to download HTML for URL: " . $nextPost . "\n");
	}

	preg_match('|<h2 id="post-([0-9]+)">.*<a href="(.*)".*>(.*)</a>|Us', $rawHtml, $matches);
	$postUrl = $matches[2];
	$title = $matches[3];
	echo "Blog post title is: $title\n";
	echo "Blog post url   is: $postUrl\n";

	preg_match('|<a href="(.*)" rel="prev">Previous Post</a>|', $rawHtml, $matches);
	if (isset($matches[1]))
	{
		$nextPost = $matches[1];
	}
	else
	{
		$nextPost = null;
	}

	preg_match('|<div class="entry">(.*)<div style="clear:both;">|Us', $rawHtml, $matches);
	if (!isset($matches[1]))
		die("regex failed again\n");
	$entryHtml = $matches[1];

	preg_match_all('|(http://www.flickr.com/photos/stuartherbert/[0-9]+/)"|', $entryHtml, $matches);
	$blogPosts[$postUrl]['url']     = $postUrl;
	$blogPosts[$postUrl]['title']   = $title;
	$blogPosts[$postUrl]['matches'] = $matches;

	foreach ($matches[1] as $flickrPhoto)
	{
		$parts = explode('/', $flickrPhoto);
		$photoId = $parts[count($parts)-2];
		$photoInfo = $f->photos_getInfo($photoId);

		$flickrPhotos[$photoId]['url'] = $flickrPhoto;
		$flickrPhotos[$photoId]['title'] = $photoInfo['title'];
	        $flickrPhotos[$photoId]['description'] = $photoInfo['description'];
		$flickrPhotos[$photoId]['blogPosts'][$postUrl] = $blogPosts[$postUrl];

		// note the photos we need to update because we have
		// seen this post
		$photoIndex[$photoId] = $flickrPhotos[$photoId];

		echo "- Photo: " . $photoInfo["title"] . "\n";
	}

	updatePhotos($photoIndex, $flickrPhotos, $blogPosts, $f);
}

echo "\n\n";
echo "Photo scraping complete!!\n\n";

// when we get to here, we have photos to go and update on flickr
?>

1 comment »

There’s a programming style I rarely see in the PHP world, but one which I use from my C programming days – programming by contract. It’s a very useful technique for writing code that is demonstrably robust, and a useful compliment to unit testing with PHPUnit.

At it’s most basic, programming by contract can be summed up as:

  • Does my function or method have inputs that are acceptable to me?
  • Has my function or method generated return data that I’m happy to pass back.

PHP has the assert() method to help with this, but it is deficient and best avoided. I’d like to share the approach I’m currently using for this, to get constructive feedback on how to evolve the style further.

At the heart of this approach lies the constraint. It is a test that must be satisfied; a bit like a runtime unit test. We could do this inline:

function breakMe($inputData = "I am bad data")
{
    // enforce our constraint
    if (!is_array($inputData))
    {
        throw new Exception('Bad data $inputData; expected array()');
    }
}

… but the problem with that is that it quickly bulks out your code with a lot of repetitive (and avoidable) content. An ideal candidate to make into a function or a method, which could yield:

function constraint_mustBeArray(&$testData)
{
    if (!is_array($testData))
    {
        throw new Exception("Constraint failed");
    }
}

function breakMe($inputData = "I am bad data")
{
    // our constraint is now in a nice function
    constraint_mustBeArray($inputData);
}

Here, we are trading performance (the cost of a function call) for both developer efficiency and reduced future maintenance costs. In general, the trade-off is worth it; most PHP developers work on small sites where developers are more expensive than runtime costs (within reason). The time saved from proving that code is working (and bailing immediately we prove otherwise) is worth saving.

We’re also introducing an important principle: there’s no return value to check. If execution continues on the line below the call to constraint_mustBeArray(), we can assume that the constraint was passed. If the constraint failed, we let whatever exception handlers there are, well, handle it.

There’s a couple of problems with this style that have been nagging me. It has a lot of advantages, and is probably good enough, but …

  • There’s no autoload support in PHP for functions, making it a pain to work with a large number of functions in an app or framework. Life is easier if the constraints are defined as object methods instead.
  • There’s limited potential to re-use constraints once they have been defined. They have to be explicitly called. It would be great if they could be passed into (say) a data model layer of some kind as parameters, to assist in data integrity checking. (Note the common thread of making sure that bad data is detected as early as possible, and pro-actively rejected by the app).

This led me to a more OO style:

class MyFramework_Array
{
    static public function mustBeArray(&$testData)
    {
        if (!is_array($testData))
        {
            throw new Exception('Constraint failed');
        }
    }
}

function breakMe($inputData = "I am bad data")
{
    // constraint is now in a reusable object method
    MyFramework_Array::mustBeArray($inputData);
}

Well, it is object-oriented for sure, but it still isn’t reusable … but it could be with lambda functions …

class MyFramework_Array
{
    static public function mustBeArray(&$testData)
    {
        // create the lambda function
        $constraint = function($testData)
        {
            if  (!is_array($testData))
            {
                throw new  Exception('Constraint failed');
            }
        };

        if ($testData !== null)
        {
            $constraint($testData);
        }
        else
        {
            return $constraint;
        }
    }
}

function breakMe($inputData = "I am bad data")
{
    // we can still call the constraint as before ...
    MyFramework_Array::mustBeArray($inputData);

    // ... but we can now also do the following ...
    $constraint = MyFramework_Array::mustBeArray();
    $constraint($inputData);
}

This approach gives us the flexibility of both worlds … a constraint method that we can call directly, and also a constraint function that we can assign to a variable to re-use as appropriate.

There’s still a couple of weaknesses with this approach, the most obvious one to me that finding these constraint methods is no longer quite as easy (you need to know which class they are defined on), but the counter-argument is that this is why frameworks have to rely on convention.

We’re quite heavily constrained by PHP’s syntax and parser limitations here, specifically the lack of macros (which could avoid runtime costs in production environments) and that we can’t assign lambda functions to class properties at declaration time.

I’m wondering how this style of declaring constraints could be refined further. You can take it as given that we would normally throw something other than Exception. Comments welcome :)

13 comments »

Gearman is a lightweight, high-performance solution for farming out processing work from one machine to another. I’m currently looking at using Gearman as a key part of the architecture of a new web API that I’m doing the R&D for. (I’ll go into why I need something like this, and why I’ve chosen Gearman in particular, another day).

Getting Gearman up and running on Ubuntu 9.10 (Karmic Koala) is very straight-forward and only takes a few minutes, but oddly not clearly documented on Gearman’s own wiki at the time of writing.

  1. Add “ppa:gearman-developers/ppa” as a software source.
  2. sudo apt-get install gearman-job-server
  3. sudo apt-get install libgearman-dev
  4. sudo apt-get install uuid-dev
  5. pecl install “channel://pecl.php.net/gearman-0.6.0″
  6. Add a gearman.ini to /etc/php5/conf.d, with “extension=gearman.so” as the contents
  7. sudo /etc/init.d/apache2 restart
  8. Check phpinfo() to make sure gearman extension loaded

You’ll find that the gearmand process is already up and running, and listening on port 4730 on localhost. All you need to do now is to write some code to take advantage of it :)

2 comments »

Whether you’re looking at your own code before (or after!) you have shipped it, or you’re picking up someone else’s code after they have shipped it, tracking down and fixing bugs is a fundamental part of programming. If you know the code well, perhaps you can make an intuitive leap to immediately jump to where the bug is. But how do you go about tracking down a bug when intuition doesn’t help?

The nature of all code is that larger systems are built from smaller underlying systems and components. They in turn are also constructed from smaller components. The bug you are tracking down will have a cause in one of these systems, and will have symptoms that are visible in other systems. The remaining systems work fine (as far as the bug you’re looking for is concerned), and you can use this to quickly and reliably find where the bug is.

Divide your larger systems down into smaller systems at logical points, such as different server stacks, APIs, major interfaces, classes, methods and if necessary individual lines of code. Test both sides of the divide, with your tests focusing on the data that crosses the divide. If one side works as expected, the bug is not in there, and you can eliminate that side from further testing. Continue testing the remaining systems and components, which you have now isolated, by dividing those up into smaller systems and components. Keep going until you’ve reached the smallest testable system, component, unit, or lines of code that show the fault. Congratulations: you have isolated the fault.

Apart from being a strategy that allows you to work on code you’ve never seen before, this approach also has the advantage that it is evidence-based. This approach eliminates guess work, and it forces developers’ assumptions about how their code actually works in practice to be challenged. The data never lies, but be aware that it can be mis-interpreted!

The approach is iterative, and you’ll find that you’ll often go back and forth between your code and your tests, making your code easier to test and your tests have clearer and more targeted test domains and results. Fix the tests that are relevant to the bug you are tracking down, and make a list of any other issues you find along the way for you to come back and address at a later date. Stay on target, and park potential tangents and distractions for another time.

Although this sounds like a slow process when described on paper, with practice it can be executed at high speed during an emergency situation. However, the need to restore service in a timely manner isn’t always compatible with this approach, and you’re normally better off returning to your test environment where you can study the fault without inconveniencing your customers any further.

3 comments »

Stuart is running a course in Manchester in October immediately before the PHPNW09 conference on how to setup and organise your PHP developers to ensure things run smoothly for you and your customers, which will include looking at how to get the most out of Trac. Learn more about the course, or sign-up now.

When it’s just you, working on one project at a time, it’s easy enough to keep track of the work you’re doing and the work you still need to do to complete the job. Chances are you can keep it all in your head, or at least keep the discussions with your customer on something like Basecamp in your head. You know that you should be using source control and bug tracking because it is “best practice”, but it just seems like too much of an overhead to bother with when it’s just you. After all, you’re working on the customer’s server, and there’s no-one else editing the code anyway.

Some of the folks reading this blog post might be cringing at that, but I’ve lost count of the number of times I’ve come across professional PHP developers who work in exactly this way. Is it because they don’t know better? Maybe. Is it because it has worked okay for them up to now? For sure.

But eventually, there comes a point where one developer becomes a team of two … or more. Having a team means that you can go after larger projects … but it also means that you have to go after larger projects to pay the team. Larger projects mean more complicated requirements, multiple phased deliveries … and a larger, more demanding (and probably a more complicated) customer holding the pay cheque.

Running a team of PHP developers (like all management activity in all walks of life) comes down to three key things: direction, organisation, and supervision. Only now it isn’t just you and a customer, just a list that you can keep in your head. Now you need to keep track of a larger list, of multiple lists for multiple people to work on that need to be brought together in the end, and if anything slips through the cracks it’s your reputation on the line. Getting the customer to come back for repeat business just got a lot less easy to take for granted.

Trac and Subversion have been part of our community’s toolkit for many years now. Used correctly, you can get yourself and your customers well-organised, and grow your reputation when you grow your team. If you haven’t started using them yet, both are open-source, and well-backed with plenty of information freely available around the blogosphere on how to use them.

Or join me in Manchester in early October, where I’ll show you how they fit into an overall approach to running your team of PHP developers.

2 comments »
Page 3 of 512345

This Month

October 2014
S M T W T F S
« Jan    
 1234
567891011
12131415161718
19202122232425
262728293031  

Recent Comments