2020-03-17

The March of Progress

In ancient times, I was assigned to a legacy project that was nearly 30 years old. Much of the team on that project had been on it for just as long. These are not inherently bad things. There is great value to a team with rich experience and intimate project and domain knowledge. Risk can arise, though, from technology, tooling, and processes that are allowed to stagnate unnecessarily, despite easy access to free education and resources. If you are reading this, chances are that you know what it's like to be in tech; the world is constantly moving, and if the momentum leaves your project behind, it only gets harder to catch back up, no matter how important catching up might be.

When I joined the aforementioned project, the team had not yet (in their twenty-some-odd years) implemented any kind of source control. Keeping a project alive without source control for that long without catastrophe struck me as miraculous, and dangerous. A degree of waste was also in play, considering the process overhead in manually managing a large code base among a growing team of developers. The merge process for the flagship product at the time I left the company was still (I wish I were kidding) to copy the entire code base onto a network drive with my changes in it; the team lead would then pick that up and copy my changes among his own for each release, by hand.

Despite being low on the totem pole at the time, I voiced my concerns early, and gently. Management responded with confidence (and I hazard to say, defensiveness) that their project was not suited to source control. Besides, they said, the transition to any source control system would be too risky, and cost too much time. Determined to demonstrate otherwise, on my own time I carried out a proof of concept to show both that the project was suited, and that the transition itself was feasible. Leadership was not impressed, and declined to move on source control.

Management's hand was finally forced, though, by the continued growth of the project's complexity and team size. With myself added to the team, it had nearly doubled in size, in just half a year. The existing process of copying gigabytes of monolithic project code and assets back and forth for every significant change (let alone the manual process of merging that code) was no longer sustainable. The project lead, however, continued to resist the transition to source control. He raised concerns over poor reliability, and worried that source control would be too slow and restrictive in its capabilities. When pressed, he admitted not researched the topic, including not having read my emails to him on the topic or looking at my proof of concept even in passing. Incidentally, though it is not the topic of this essay, that was the moment I decided I was going to start looking for a new job.

By way of disclaimer, I am not myself a team lead (though I aspire to become one some day). However, having worked under several, I have observed and learned from some of the habits of leaders who I, personally, believe to be successful. Now, it's good to have an understanding of what I mean here by "success." The management team described above considers themselves to be successful, because their company has not failed and there have been no catastrophic failures of the software in the wild - yet. From my perspective, though, I see a great deal of churn on inefficient processes, which wastes time and energy that could be spent on new features or products; I see tremendous risk involved in the event major defects are discovered, rollbacks need to be made, etc; and I see stressful pain points for the developers and even the team lead himself in working within the confines of the existing system.

I would like to give this team the benefit of the doubt and say that it is fear that held them back, rather than laziness (and I do believe this to be the case). They did not understand what source control was, and so they were scared of it. The way they saw it, things just worked, and worked just fine. However, a small amount of research, an openness to change, and a relatively small amount of effort would have made their lives far and away better than under the existing system. I suppose the moral of the story, if there can be said to be one, is that some comfort with learning about the unknown can allow you to convert unknowns into knowns. And when you do that, you broaden your horizons, strengthen your skills as a team and an organization, and give yourself more tools with which to be successful. I, myself, have a natural tendency to hesitate and shy away from the discomfort of the unknown; but only by pressing on into the unknown can I uncover what is there, and build myself up for the next task ahead.

2020-03-10

Inverted Mouse Scrolling in Windows w/o RegEdit

I love inverted mouse scrolling. Meaning, when I spin my scroll wheel away from myself (up), I want the UI to scroll down. When I spin the wheel toward myself (down), I want the UI to scroll up. This feels intuitive to me because it's how touch screens (phones, tablets, etc), and touch pads (Mac and PC laptops, notebooks, etc) all work. When you slide your finger upward along the screen or touch pad, the thing you're touching follows your finger upward. I never understood why scroll wheels behaved in reverse. To me, the default Windows scroll wheel behavior is the "inverted" one.

Unfortunately, Windows provides no native method of flipping this simple input. If you know what you're doing, and you have enough system access, you can modify the Windows system registry using regedit.exe. This strikes me as a pretty invasive way to go about flipping something as simple as mouse wheel scrolling. But for some reason, this is the primary method I keep finding in searches. It turns out there is an easier way that doesn't require the same level of administrator privileges (though it does come with a potential downside).

PS C:\> Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Enum\HID\*\*\Device` Parameters FlipFlopWheel -EA 0 | ForEach-Object { Set-ItemProperty $_.PSPath FlipFlopWheel 1 }

Shamelessly but usefully ripped from this post on StackOverflow.

The downside to this method, as you can see, is that the FlipFlopWheel property is being set to 1 for every single object in our query. This may not be ideal for your situation. For me, I just needed a way to get my mouse wheel flipped, and to have it persist, without needing to go into the registry. I'm surprised that this issue has persisted so long, over so many iterations of Windows itself, that this StackOverflow question from over 8 years ago is still relevant, but here's the solution!

Separation of Concerns

Rules are handed down from authority figures. Best practices, though, are born of community consensus on good ways to get things done. There are usually good reasons for certain practices to surface out of our collective consciousness. One simple, sometimes painful example is white space in code. The example is painful because there are so many ways white space can be used (or abused). The broader community prefers a handful of styles; these make code easier to read and digest for other developers (and even our future selves). Proper use of white space, thus, has become a best practice. Tools now exist to enforce proper spacing; languages like Python even go so far as to define scope using white space.

Okay, let's get into it. I have a piece of gold-stamped parchment declaring me "Master" of computer science. I'm not so sure about that. In any case, there are certain best practices that even I can recognize as sensible, useful, and even important. When these practices are ignored (or willfully disregarded) it gives me the fingernails-on-a-blackboard shivers.

One such best practice is known as separation of concerns. There are many levels, high and low, at which it can (and should) be observed. The basic idea, however, is that you want a given piece of code or architecture to:

  • do its job
  • do its entire job
  • do no one else's job

Once you've called a function, you should be able to expect its job to be completed. Sure, the function might delegate responsibility to smaller helper functions, breaking the task into workable chunks; but when that function eventually returns, the job should be done. Importantly, nothing else should have been done along the way. When extra stuff gets done along the way - either intentionally or by accident - we call that a side effect. According to the tenets of functional programming and general coding best practices, side effects are bad.

My mission today is to use a frustrating, real-life example to illustrate the importance of this concept, leaving as much of my emotional baggage at the door as possible.

A company I used to work for had a design review process in place. All tasks required project lead approval before development could begin on a task. It meant some extra overhead, but I liked the sentiment. I think it's a fine idea to have strategies vetted by leadership and scoured for weaknesses before mistakes are made. Cool.

So my task was to add a reporting feature to our existing product. The existing feature retrieved documents from the back end, looping over them and calling a helper function to print each individual one. There were weaknesses in this design to begin with - you may wonder, as I did, why each individual document had to be retrieved with a separate query - but it wasn't my design and I just had to work with it.

Let's pause in brief review. Someone wants to print a batch of reports. They call the batch printing function, whose one job is to print a batch of things. We expect our function to:

  • do its job (print the batch)
  • do its entire job (read each document file and print it correctly, i.e. perform sub-tasks that the main task depends on)
  • do no one else's job (don't randomly delete user accounts, etc)

Still with me? Okay. Now we get to my task and the problem I encountered.

At the batch printing level, I was to gather data on each individual report in the batch. This was to be aggregated and added to the final batch report. I came up with a few angles of attack. With separation of concerns and best practices in mind, I avoided solutions that leaned on side effects. What do I mean, here? I mean that I wanted to be clear with what each part of the code was doing; and I wanted the code to do nothing more than it was asked.

For example, my new function would return data to be aggregated, rather than modifying aggregation data by reference. This meant I had to add another function to manage the aggregation itself. A little extra work? Maybe. But the result was crystal clear code, functions with crystal clear purpose and methodology:

Function A, aggregator

  • do its job (aggregate data from multiple calls of getter)
  • do its entire job (return aggregated data)
  • do no one else's job (don't email users' credit card numbers to other users)

Function B, getter

  • do its job (gather data about a document)
  • do its entire job (query back end and return the result)
  • do no one else's job (don't drunk dial Function X)

Without getting into details, I can tell you that even this solution felt a little messy to me. Within the project and resource constraints, however, and in light of what followed next, this was a positively glowing example of separation of concerns.

The project lead rejected my design proposal wholesale. I was instructed to write a system of helper functions into the ORM class for the documents.

Confused?

me too

He specified hard-coded mock-ups of data from unrelated classes to be written into the ORM. This would, he argued, allow us to perform testing and comparisons of data and models from within the ORM class. In other words, outside code that called the ORM class constructor would have no idea that a partial testing framework and a whole suite of hard-coded data was being instantiated, parsed, judged, and passed around, complete with database interactions, that had nothing to do with the the ORM itself.

Imagine troubleshooting the reason for data coming in from the wrong table. Imagine database calls being made when you aren't making them. Imagine your report's performance profile degrading, only to find out that someone had hard-coded your document's ORM class with sample data and testing infrastructure ... for some random batch document report printing functionality. Can you imagine that?

I can; I lived it. This was an extreme example, but let the allegory of the hard-coded ORM stand as your reminder to keep best practices and separation of concerns close to your heart.

2016-10-11

Switching to React - First Impressions

In earlier posts I wrote that I planned to use Handlebars as my templating engine for the CYOAG project. Things happened since then that caused me to change directions towards ReactJS. I'll take a moment here to say that I've had no contact with or sponsorship from the Handlebars or React teams. This is all just my opinion, formulated based on personal experience working with these technologies on the side-project level. Incidentally, though I'm enjoying working with React, this is notwithstanding my opinion of Facebook, who we have to thank for React. So there's some food for thought.

So why did I make the switch?

I realized Handlebars was more challenging to learn than a templating system needs to be. I didn't find community support to be as extensive or helpful as I'd like. Handlebars also hasn't enjoyed industry adoption as extensively as React, so from a career growth aspect React seemed a better choice. Ironically, the reason I chose Handlebars originally was because we used it on a project at work. But after talking to the lead dev on that project, I learned it wasn't selected for its quality or utility, but due to circumstance. Finally, having built my first working Handlebars prototype, it just didn't feel right. There's no remedy for that.

So, React. It's already big in the web dev scene, has been for some time, and continues to grow. Whether it'll become a longstanding mainstay in industry remains to be seen, but signs point to yes.

I'm surprised to say it, but React was easier to pick up than Handlebars. It surprises me because React has lots of new ideas and syntax you have to learn, while Handlebars has relatively few and advertises itself as a "no frustration" templating system; which, I can attest, is patently false. React, likewise, advertises itself as "painless," also a lie. I'm being a little harsh here, but point being that I find React less painful than Handlebars, despite the fact that neither is a walk in the park.

To be fair, Handlebars lends itself better to smaller projects. The infrastructure and planning required to use it is lighter. To get Hello World up in React, you have to do more work, download more libraries, and learn more technology: JSX, props, state, rendering, components and their lifecycles, etc. In Handlebars, you slap up a template and you're off to the races (relatively-ish). Build processes are another story. I found precompilation to be roughly equivalent in difficulty from Handlebars to React. The documentation is better for React, though, due to superior community support. More community support means more flavors of help, and a greater likelihood one of them will match your style of learning.

The appeal of React is reusability, structure, and organization. From a mile-high perspective, let's say you're the app lead for Facebook, and you want a UI component to let people "Like" things. All kinds of things; wall posts, images, events, shared links, comments, everything. With React and some proper planning, you can make a single component displaying a "Like" link, that dynamically hooks up to whatever needs to be "Like"d. The component itself doesn't care what it belongs to. You can plug-and-play it wherever. One downside to this, as mentioned above, is that the initial development of this component is relatively heavy. Once you have it built, though, you can reuse it dozens, hundreds, millions of times, at very little dev cost.

Another downside is that you have to be conscious of your component structure when building larger applications. CYOAG is not a large application, but if I didn't think about component structure in advance - and thus plan out how state was to managed and communicated among components - I'd find myself having a real bad time. The complexity of component interactions isn't hard to wrap your head around, but once you've built them, that complexity makes maintenance and modification a chore, and a considerable risk.

Part of this is because state and props "cascade" (my term) from parent components to child components when you build an application The React Way. So if anything changes along the cascade path, you have to make sure you account for it at every step and in every handler. Of course, things change as we go, this is life; but I think we can all agree it's to our advantage to minimize this, and with React, best practices in design amplify this risk/advantage relationship.

In short, React encourages better software engineering principles. Other technologies allow you to start hacking out a solution before thinking it through, which can lead to other kinds of headaches down the road.

Non-technical Example

We can use my (thus far incomplete) React component design doc for CYOAG as an example for this. I won't bother quoting the doc here, you can read it if you like. In short, the application state is best managed at the top level. One easy reason to explain for this is that if you allow children to manage state, you'll quickly find yourself making more calls in integration, and having a hard time communicating among components. So we manage the state at the top level, and (in short) provide getters and setters for particular parts of the state to children through props.

When building your components, it's helpful to know what state you'll need to be passing down to children from the ancestral point of view; and what props to expect to receive from the descendent's point of view; and what functions will be needed in order to manage and pass these around. And later, if changes need to be made, you need to keep this "flow" of state in mind to preserve it.

These are my early days with React and my opinions and understanding will undoubtedly develop as I develop applications using React. Once I start using this code "in production" on CYOAG, I'll have plenty more to say, I'm sure.

Thanks for reading!
- Steven Kitzes

2016-10-03

Circular Dependency in Relational Databases

This post is going to have an element of storytelling to it, because while some elementary understanding of SQL will be required, the Cliff's Notes seem to tell it all:

You probably want to avoid circular dependencies in relational databases.

These Cliff's Notes are incomplete, a little vague, a bit of an oversimplification, and up for debate besides. Some RDBMs (relational database managers) handle circular dependency better than others, and some problem domains necessitate circular dependency. Take, for example, table representations of nodes in a graph that allows cyclical paths.

However, not all problems necessitate circular dependency, and some RDBMs prohibit, or aren't capable of handling, circular dependencies. In these cases, we need to design around the limitations of our RDBM, and in any case, it's good to be aware of circular dependencies, just on principle. The purpose of this post is thus to share my own early explorations of database design, and to describe how I broke out of the circular dependency trap on a small side project.

Some project context will help here. I've been working on a passion project for a little while now. The high level idea is to build a "Create Your Own Adventure" game, where all content is user-generated, and the best story line branches rise to the top through a voting system like that used by Reddit and StackOverflow. An important early part of this project involved designing a decent database schema to build the rest of the project on top of.

You may wonder why I chose an RDB (relational database) for this in the first place, rather than an alternative such as some flavor of NoSQL. I plan to host this on the AWS (Amazon Web Services) free tier, so I can learn more about AWS. Unfortunately, at the time I started this project, the AWS free tier didn't offer any option other than RBDs. Incidentally, an RDB is almost certainly best for this type of application due to the interrelationships of different types of data, but that's a topic for another time.

So, how did a circular dependency surface in my design?

One feature I want this app to have is position persistence. When a visitor is reading the story, I want to remember where they left off, so they can conveniently come back and resume reading the story at a later date. My plan was to give the Users table a column called Current, to track a given user's position in the story. In other words, a story node would appear as a foreign key in the Users table.

The Nodes table was designed with an Author column to record who had authored a given story node. This was to support a variety of features, such as prohibiting authors from voting on their own posts, and allowing an author to edit their contributions, while protecting those posts from being edited by others. Unfortunately, you'll notice this means a user (author) is now a foreign key in the Nodes table.

Uh oh.

Now I have a node ID as a foreign key in the Users table, and a user ID as a foreign key in the Nodes table. This is a circular dependency of sorts, and is a block in SQL. You'll find some argument as to whether circular dependencies are fundamentally bad practice, or whether SQL just suffers some unhappy limitations. After all, what's conceptually wrong with tracking current node in the Users table, and author in the Nodes table? There's no conflict in authority over the data, since a user doesn't own its current node in this read-only context, it's just a position marker. To be fair, SQL doesn't understand our motivations, and to argue that an RDBM's designers could have done more toward enabling us to convey these human intuitions to the machine is too philosophical for this post.

Whatever the reason and our opinion of it, MySQL forbids this kind of interdependency. The solution I came to was to create a simple table called Positions, with a compound primary key consisting of two foreign keys, one being a user ID from the Users table, the other being a node ID from the Nodes table. Despite some apparent duplication of information, this works for the simple reason that the Positions table relies on Users and Nodes, but neither Users nor Nodes relies on Positions. I'll try to illustrate the improvement in design without opening Visio:

An arrow (->) implies dependency

Before:

Users -> Nodes
Nodes -> Users

As you can see, this yields Users -> Nodes -> Users, and so on, ad infinitum. This is bad.

After:

Nodes -> Users
Positions -> Users
Positions -> Nodes

We can combine two of these lines together to yield:

Positions -> Nodes -> Users
Positions -> Users

but you can see that this doesn't introduce any loops, or circular dependencies, as my first design did.

Thanks for reading!
- Steven Kitzes