Foreign code

Sep 22, 2011

On last weeks' Sea Of Memes the topic of using libraries, working together with other people and copyright issues came up.

Then, a few days later, Shamus Young discussed the article on his blog Twenty Sided.

I disagree, with both of them, on almost all the issues raised.

The first issue is "working with strangers". Here's the quote:

I'm also trying to learn things about graphics programming and game programming. You don't really learn something unless you try to do it yourself. Using a big multi-platform library or an existing game engine wouldn't teach me much.

Then there's the problem of debugging. Things go wrong and you need to isolate the problem down to a particular piece of code. It helps enormously if you've written it all yourself. If other people are working in the same area, it's practically impossible to know whether the bug was caused by the thing you just changed, or by someone else's change. It also helps that I know my own style and the kinds of mistakes I tend to make.

I commend mgoodfel on his journey to learn an interesting skill and work his way towards a fully self-written game engine. I'd like to do that myself some day, so I understand that he wants to write most of the code himself. That shouldn't stop him from collaborating with other people towards this goal, however. In fact, most of the things I learned didn't come from myself sitting in the basement doing it alone, but from talking to other learners about it and talking them through my thought process, oftentimes finding crucial flaws in my thinking. Of course, listening to someone else's thought process and finding their flaws, all the while seeing a wholly different position. In a way, Sea Of Memes is mgoodfels way of achieving this, whereby he writes down his thought process, then waits for comments from his readers. He even states this specific goal in Part 29:

The writeups revolve around the code. A part gets done when I get a piece of code running, not on any regular schedule. And then I wait for feedback, either downloads of the demo, or comments or both.

But still, to me, it doesn't feel like collaboration, it's more like an improv performance, where the audience is allowed from time to time to yell a word or two about the next piece mgoodfel should perform and how it should look like.

What's even worse, there is a simple system that would address the issues cited above, as well as mine, and that is GitHub. The workflow for an open-source GitHub project is like this: You create a repository, then push your initial state there. Everyone can see this code and do a "fork", i.e. copy the repository into their own account, then check it out and work on it. Meanwhile, you've worked on it yourself, pushing code to the public front end only at times you choose (kind of like putting them into a Zip file on your blog, only that it's easier to see what changed). If someone else wants to take your code and work towards a different direction, that's fine because you won't even notice. If someone wants to contribute to your code, they push a change that implements what they want to show/give you and open a "Pull Request". This means you get a message that someone wants to contribute, a patch in regular diff format and a button that says "Do it!". You'd then read through their patch, thinking about the consequences the integration has for your code, thereby making it your own, and if you're satisfied with the integration, you pull it into your repository, merging the change. If you're not satisfied, you simply ignore the request. Happens all the time.

The point I'm trying to make is this: Pull Requests open your code up for collaboration if you choose to integrate. If not, you still saw someone else's thinking, which is always a good thing.

Oh, and you know what: The internet doesn't care when you're awake. It doesn't matter when you pull, when you push and when you work on your code, because everything is asynchronous. You and all of your potential collaborators, work at their own times, paces, and work styles.

The second issue that comes up is libraries. Here's what Shamus Young writes about using libraries:

Using code from strangers is always a gamble. Sometimes, on rare circumstances (perhaps only a few occasions in your career) the clouds will open and you’ll find yourself in a beam of sunlight, looking at a link to the perfect library while angelic music plays. It does happen. But more often than not, you run into the usual chain of dependencies, missing files, lack of documentation, version incompatibilities, inscrutable bugs, missing features, and poor programming interface.

To me, that sounds as if Mr Young was burned by some libraries or code snippets he tried to use for his own project, Project Frontier. It sounds as if he sees libraries as a time-saving tool. When he needs some functionality in his program, he thinks about the quickest way to achieve that functionality, which might be (a) writing it yourself or (b) taking code from someone else to do what you need. If you see it like that, then yes, every library is a gamble, where you wager time and hope to win back that time by not having to write the code yourself, i.e. you invest some time into integrating and probably bugfixing and working with the library, and earn back the time by saving implementation time. If the library is good, you have made a net time win, where you have the functionality quicker with option (b) than option (a). If you find out that the library isn't for you and "give up", your investment is lost and you'll have to start anew.

That's a bleak way to look at libraries, I think. To me, libraries are about focus, and not losing it. To me, the choice isn't "how much time will this cost?", instead it's "how well does this fit into my vision?". So, I ask myself: "Do I really want to write a JPEG loader?" and "Do I really want to mess with certificates myself?" (a topic I will discuss in the post after this one), or "Do I want to write a Minecraft chunk loader?". Some of the times, the answer is no, some of the times the answer is yes. When the answer is no, I'd rather spend an hour or two banging some library in shape and massaging my own code instead of looking at byte-descriptions and file signatures, until the question becomes "Do I want to maintain this ****** library or find some other way?", when I probably give up.

The third issue mgoodfel brings up is copyright and patents. This is a hairy one, and I can understand that he's afraid of being sued. On the other hand, what could anyone win from suing him? He doesn't appear to have a lot of money and if you wanted to stop him from distributing or using some piece of code, it'd probably be more effective to ask him. Most of the disputes in that area are settled out of court, anyway, and I don't think one should be scared by the way big companies with huge cash reserves behave towards each other. Also, note that Samsung and Apple are still cooperating on other areas of their businesses.

Perhaps I am too young, or haven't been burned by issues like these, or I am in a jurisdiction where it's easier to handle, but I don't think a programmer should care too much about patents, lawsuits and unintentional breach of copyright. In fact, my feeling is quite eloquently summarized by pud, when he writes "Fucking Sue Me".

So, there you have it. My personal, completely unfounded opinion on why you should do everything differently than you do. Note, again, that these are my opinions, my suggestions for thinking about it, not requests that anyone change anything about the way they do what they want.