in blog | Better Simple |
---|---|
original entry | Project - Craigslist Project Finder |
As I mentioned in another post, I use Craigslist for finding some of my work. This post explains what I’ve done to make my project searches more efficient.
I look at three sections on an individual cities Craigslist page, “software / qa / dba”, “web / info design” under jobs and “computer” under gigs. I chose these because they are where I have found most of my projects. I used to check “internet engineers,” but found that my target market didn’t post under that section often. And if there were applicable job postings, they were likely under one of the other sections as well.
Since I work remotely, I like to check every city’s feeds. Unfortunately, when I tried to do this manually, I often forgot what cities I had checked, and the last time I checked them. My process was: quickly skim the titles until I found any that were interesting. If I did, I opened a new tab for that post. Once I felt like I had gone far enough back on the feed, I would then go through the tabs I had opened, closing any that I didn’t want to respond to. Any tabs left opened, I’d send responses to the ad poster. As you can imagine, this was a very time consuming process and honestly didn’t result in much work.
I made this process more efficient by using an RSS reader, adding some filter keywords, and filtering posts for uniqueness. Currently, I get an email about once every two hours with any new ads from 5,810 cities (17,430 feeds.)
The first step in improving this process was to programmatically fetch the feed and find any new ads. The system has a database record for every feed. It has a date and time stored for the last time it was fetched, and the next time it should be fetched. When the current time exceeds the next time a feed should be fetched, it will make a request to that feed’s RSS feed. You can view the RSS feed of any Craigslist feed by appending a “/index.rss” to the URL. Then the system parses the returned response, storing an ad when its posted time is after the last updated time the system has stored for that feed.
The next step was to identify posts that had keywords which caused me to look at them in more detail. For example, if a post has “python,” “django,” or “website” in its title or short description, it’s queued to be sent to that user. The system also has the capability of looking for keywords in posts that the user wants to ignore. For example, if a post has “php” or “wordpress” in it, the system will not notify the user of that post. This step of filtering the posts represents me scanning through the titles of ads looking for any I may remotely be interested in - except that it’s much faster than me.
The final step was to notify the user of the new posts. The system has a simple setup so it contains the last time the user was notified. Any posts that were posted after that time will be batched together in an email and sent to the user. Unfortunately, this resulted in emails that had duplicate ads as users would post the same ad across many cities. This was solved by hashing the title and short description of the ad, and then filtering out any duplicate hash values. It’s not perfect, as there are still some duplicates, but it handles ~95% of them.
This system has worked well for me, and I have considered the idea of turning it into a product. It’s been immensely useful and has saved me a lot of time. If you think you’d be interested in this product, please let me know. It’s always easier to be motivated to work on a project that has customers after all!