RWack.com
The mostly useless search engine on a budget

What is RWack?
RWack started out as a simple curiosity project about webcrawlers and databases. After a few hours of programming I had written a crawler and a method of building an index for searching. Then in morphed into a study on information retrieval and forward indexing, stemming, and soundex implementation.

Then distributed computing and filesystems took my interest, but I did it using low power hardware. I now have a cluster running off of Intel Atom based computers.

Why?
Since I've been using the Internet I've always had a fascination with how the big guys did things and always wondered if it really took the kind of resources and money they throw at these projects. So I spend some time hacking together a few tools and decide for myself. I've come to believe the answer is both no and yes.

  • No? Believe it or not, the technical side is very easy. At least for me. Working with large sets of data, moving and organizing the data, and writing the tools to do it come naturally for anyone who can program and is curious. In fact all the information on the basics is just a mouse click away. If you really want to get some hot stuff rolling, find a couple of programmer friends with a similar interest to work on your project in their spare time. That's how Microsoft started. That's how Google started. That's how Apple started. That's how Yahoo started.
  • Yes? Sales, Marketing, Management. If you look at any of the big guys, specifically, if you look at how they are organized, you will find out the majority of their employees are not engineers. They are people who specialize in nothing but exposing the company to the public. No matter if it is sales (getting other people to give them money) or it is advertising (giving money to get more exposure), or managing people(somebody has to herd the cats). Then there is dealing with growth. Your systems have to handle the flood of traffic from the outside world. If my poor little server were to get listed on Slashdot or Yahoo's front page, it would buckle. Not good if it was trying to make money.

In my experiments, I've written a map server to emulate terraserver, an on-line auction system to emulate ebay, a classified ad system to emulate craigslist, a invite and calendar system to emulate evite, and a few others here and there. None really went anywhere, though the map server was seeing about a million hits a year. Unfortunately I lost most of this work in a server crash and since they were all hobbies, I wasn't too concerned.

Why A Search Engine?
I wanted to learn about web crawlers, forward indexing, improving my skills in SQL, and again, how the big guys do it. Since my skill set is in automation and managing lots of processes this just came naturally to me.

So what's it made of anyway?
It started out as a typical LAMP setup. Linux, Apache, MySQL, and PHP. While it was a great place to start, I've since moved to Linux, Apache, Tomcat, Nutch, and Hadoop.

The hardware is currently a dual core x86 with 4gigs memory and two 1.5T SATA drives acting as the gateway, and hosting a few other things for a few friends and myself. Nutch and Hadoop is running on a cluster of three Intel Atom 330 based computers with 2 gigs memory each, 340mb SATA drives on a gig network. One of them is the name node and tomcat server, while the other two are simply datanodes and search servers. These atom boxes consume about 40 watts of power each.

I process approximatly 30k URLs every 24 hours. I haven't reached the point of recrawling yet, and from time to time I purge low scoring URLs. I also from time to time delete everything and start over. When I wrote this, there were about a million web pages indexed and about 14million URLs queued to be indexed. I'm currently waiting to see what kind of limit I will run into with this setup.

What does it do or not do?
It scans web pages, indexes them, and collects the links. Then it starts over again. There really is not a lot of data here, thus, the "Mostly Useless" statement in the page heading.

Can I have your secret algorithm?
There is no secret algorithm. It's built on standard algorithm's found on the internet written by some very smart people. Go search them out for yourself (I suggest using google though). I will point you to a couple of resources that I found very helpfull.

I haven't made any improvements or pretend that I can produce better results. By now, considering everyone else's research budget, you can count on them producing better information. Back to that time and money thing again.

Will you continue to improve it?
I might. Since this is a hobby, I only work on it to satisfy my curiosity and to relax. This is supposed to be fun. If someone wants to give me a few giant bags of cash to make it a worthy competitor then I suppose I would make a professional go of it. Most likely I would just sell it off to someone if they really wanted to make this a real search engine.

What does "RWack" mean?
RWack doesn't really mean anything. At least it doesn't to me. One day I was trying to come up with a short domain name to use for my test site. I was thinking "whacked" but I couldn't find anything close to that, so I just took "wack" and put the letter R in front of it. Simple and short.

After searching around the internet I found that it is also a typo of "rawk". Slang as in "You Rawk!" or "You Rock!". It also is used in place of "Rack" as well. I think there might also be a few people with Rwack as a last name.

Pronounce it however you want.

Who is Jesse Hires?
I started in the tech industry in 1992 working for a small ISP while I went to college for Computer Science. I wound up dropping out of school, but kept working for the ISP. With a lot of good luck, I've managed to keep a career that pays me quite well since those small beginnings.

I am now a Build Engineer by trade and have been since starting at Microsoft in 1997. If you don't know what a build engineer is, it would be very difficult to explain what it is. Heck, I didn't even know what it was when I started doing it. When I first started doing it, it was just a stepping stone type of job. Now it is a full fledged career path. I also get paid quite a bit of money for it just like a "real" computer programmer.

A couple of things being a Build Engineer has taught me is that automation is key to a high production environment. That and perl is your friend. If you have to manage lots of processes, computers, and large amounts of data, there is no other way to go. I know how to program in quite a few different languages but always manage to fall back on perl and php. I sprinkle in some sql here and there for fun.

So you want to know what it is I've worked on? Well, I suppose. Here is a non-all inclusive list of projects I have worked on. There have been lots of other small projects scattered throughout that I worked on at the same time.

  • Windows CE 4.2
  • Windows CE 5.0
  • Windows CE 5.1
  • Windows CE 6.0
  • Windows NT 4.0 Terminal Server Edition
  • Windows DDK
  • Windows 2000
  • Windows XP
  • Windows Vista
  • Windows 7
  • As you can see, I've worked on some pretty big projects. I still do. I haven't worked at Microsoft continually though. I've left a couple of times in an attempt to take advantage of some startup fever. The results did have payoff, but not in terms of money. More in terms of experience and pride. Here are some of the project I've worked on outside of Microsoft.

  • Detto Intellimover at Detto Technologies
  • Soleus at Intrinsyc International
  • Conversational Search and Speech technology at VoiceBox Technologies
  • The first commercial installation of Internet over Cable in the U.S. while at Internet On-Ramp
  • If you work at Microsoft, what does RWack run off Linux?
    Well, that's because my start in the computer industry revolved around Linux and I have not completely given into the dark side. Not to mention that this is a low budget operation. Low budget, as in hobby. While I do get software from my employer for a very very low price, it is not free. In fact, even with the discount I get, it would still cost a fairly substantial amount, to me anyway. Since this is a hobby, I can't justify spending more than a few dollars a year on it.

    Can we hire you for our company?
    Maybe. You would have to provide a pretty compelling pay package to get me to leave where I am now. While I do appreciate a big paycheck, it is more about the people to me. I am working with one of the best teams I have ever had the pleasure to work with. Not only are they co-workers, they are close friends and been so for over a decade. I would consider consulting for you, but it would be part time in my free time, and probably out of your price range.

    Emacs or VI?
    Emacs

    Resume
    My mostly up to date resume.

    Back To Rwack