Thursday, September 19, 2013

Do good communication skills improve your code?

What is one of the primary design flaws we see over and over in software?

Tight coupling.

I started wondering why this might be. I thought about the mental process I go through when attempting to write cohesive, decoupled code. I realized that is was very similar to the mental process I go through when composing paragraphs and sentences.

Could there be a correlation between composition skills in one's native language and their skill at designing and developing loosely coupled software solutions?

Certainly Uncle Bob Martin's book "Clean Code" suggests something of the sort. In it he approaches method writing much like telling a story. If one is better at communicating written stories, perhaps they would be better at this form of design.

UPDATE: I found this link after posting that I thought I would share:

What do you think?

Monday, September 16, 2013

Put your Adwords fears to REST (managing lead consumption)

I just finished up a neat little project where I got to help a client streamline how they were managing the monitoring of lead consumption among their sales members and making sure that they were not overspending in a particular region at another's detriment.

The Problem: The client is using Adwords to garner leads in various states. Their sales team members are selling in multiple states. They have a third party system for distribution of leads to the team members. This system has a toggle the team members are to use to indicate whether they are actively taking leads at a given time. When this flag is on for a team member, leads are sent to them. However, many team members were failing to turn their flag off. These members where then getting leads assigned to them when they might not even be in the office. This was to the detriment of other members who then would get none. Due to the agreement to provide members leads, this meant a lot of work tracking consumption, establishing quotas and trying to modify behavior. It also meant that sometimes the Adwords budget for particular campaigns would have to be increased in order to meet the internal agreements with the sales team. The cross region nature of sales member activity also meant that it could effect campaigns in multiple regions. They were tracking all this using a spreadsheet and a lot of leg work.

The Solution: Redpoint was asked to build a small tool to automate what the lead manager was having to spend too much of his time fretting about with the spreadsheet. In a short 6 weeks from Inception to Production Deploy we came up with a system to do this. The tool ultimately consisted of a web app and a poller service. The components pulled data from various web services including the distribution tool, internal employee data services, and Adwords. Most were RESTful - hence the title of this blog. The web app gave the lead manager a dashboard to monitor individual sales member consumption of leads against an assigned limit. This contributed to the calculation of potentially available leads for a given state. Once all members selling in a state had gotten all the leads they were supposed to, all campaigns in that state would be paused for the day. If the lead manager wanted to they could then increase a particularly good sales persons limit and allow them to sell more, or refocus a sales persons lead consumption to a particular state or states. The poller service was automated by a Windows Service using CronTriggers to assign various pollers to maintain the list of sales team members, the states they sold in, their lead consumption, the availability pool, and ultimately to pause or restart the Adwords campaigns appropriately.

The Result: The final tool, as it had evolved, quickly gained admiration from the executive team and the project sponsors. Even in its testing state, it had shown that it was more than just an automated valve for Adwords. The dashboards and multiple levels of control built into the solution allowed them to quickly see patterns in sales member activity and behavior. The flexibility of the Cron Triggers allowed them to schedule polls and resets as well as to automatically turn off all campaigns during down times and prevent 'greedy' team members from stock pilling all the leads while others were out of the office. This also assuaged executive fears of blowing the Adwords budget as they soon saw how this was an excellent emergency shutoff valve to control their spending.

For phase two they are already talking about the ease of creating trending and tracking data from what is being pulled as well as adding in different lead types and groupings.

As an Agilist, I appreciated this project as it had become an interesting, dynamic and interactive information radiator that would allow them to manage and change how they do business to meet the changing needs of their market and their staff.

Other Applications: Of course this sort of tool could be very useful to any growing company who is struggling to wrangle their Adwords budgets and their sales team's quotas. A similar tool could be useful to Scrum Masters, PMs or Team Leads to monitor and control story assignment within a team, by using the information from their story management system and assigning categories or even complexity sections to various members ("you did three '1 point' stories this iteration so far, you have to take a '2' or a '3' next" OR "You have been only taking stories involving that piece you created, for the rest of this iteration you need to take stories from other areas of the application"). While not so helpful with a small team, it could be very handy with a large distributed team - like the sale team in this case.

The language you code in could say a lot about your developer culture

In his essay on "Race and Language" Edward Augustus Freeman suggested that language, rather than race (genetic similarities), might be a surer indicator of cultural kinship among peoples. Professor Freeman wrote: "Every word that a man speaks is the result of a real, though doubtless unconscious, act of his free will. ... A man cannot, under any circumstances, choose his own skull; he may, under some circumstances, choose his own language. He must keep the skull which has been given him by his parents; he cannot, by any process of taking thought, determine what kind of skull he will hand on to his own children. But he may give up the use of the language which he has learned from his parents, and he may determine what language he will teach to his children." As such, the choice of language made individually and in the aggregate points to attributes of the language and those already speaking it which the new speaker of the language appreciates.

Perhaps this is why few topics among developers can start a 'religious war' faster than that of programming language.

The use of language is in essence a tool to capture an idea and communicate it to another. In IT we largely look at languages as tools to efficiently solve a problem. By 'solve' we really mean 'communicate an algorithm to the machine'. Just as some spoken languages are better for communicating facts or legal concepts and others are better for poetry or philosophical abstractions, each programming language is more efficient at 'solving' some technical problems than others. One may be excellent for modeling relationships, another for solving concurrency problems.
Looking at language this way is a good pragmatic approach to choosing 'the right language for the job', but it also can subtly mask or neglect other factors in language choice. Namely, the speaker and prospective audience plays a large part in what language communication is captured and how that communication is shaped. The above take on the use of programming language measures its efficiency in processes or cycles or memory consumed. It is the run-time efficiency - or how easily the computer understands and can execute the message. However, when using high-level programming languages, that is really misleading. It is really more of a measure of how good the compiler or interpreter for a given language is at translating the solution into efficient byte-code. So, compilers and interpreters aside, the real audience to gauge the efficiency of the language used is - humans.
Looking at how we use language it seems to lie upon a continuum between communicating to a large or a small audience. On one hand we have the language of media news reports, which linguists will often point to as a general epitome of the common speech of a people. It is intended to be understandable and consumed by the largest portion of a population. Prose is also near this end of the spectrum. On the other end you may find cryptography, where the audience is narrowed to perhaps only one other person. Sonnets, slang and technical jargon are closer to this end. They use conventions and specialized vocabulary which only a smaller subset of the population would have exposure to. Obviously a particular communication can lie closer to one end or the other given both the speaker and the audience. For instance a technical treatise might use highly exclusive vocabulary, but be written utilizing the most standard academic grammar and style. This 'regulated language' ensures that the message, while directed to technical specialists, will be understandable to the widest range of that population. It could even be accessed by outsiders armed with a good dictionary. Slang, on the other hand, often develops out the needs of the speaker... Complex grammar and punctuation is boiled down sometimes to only partial words. This makes the message easier and faster to say. To achieve this, the complexities of grammar and punctuation are replaced by context. One must understand this context, often cultural, to unpack the meaning behind this terse communication. It is particularly difficult for outsiders to master, and often intentionally so.

The intended range of audience has certain temporal effects as well. A loose jargon may make it easy for a small, tight knit group to communicate efficiently. However, it will be much harder to decode the meaning years later, perhaps even by members of that same community. Formal and standardized grammars and punctuation provide a degree of staying power to a language, allowing it to be understood for generations.
On can see this in development shops where one language is the exclusive language of development. On one hand a particular language may have been chosen for its similarity to human language. This may make translating business requirements into code easier. The same language may make it extremely difficult to describe the algorithm for performing a particular calculation. With greater complexity, it makes it more likely that a mistake will be introduced. The wordiness of this complexity can also make it harder to find the mistake. It also assumes that those writing code write well in their natural language. Run-on sentences can exist in code. A run-on piece of code is recognizable by loose cohesion, which is well known to negatively impact its brittleness and maintainability.

As a developer it is important to know the ways of expressing algorithms that is most natural to the language being developed in. This is called Idiomatic Code. However, a more seasoned Lead or Architect may realize that the eventual human viewers of the code may dictate different ways of expressing the same algorithm. This may even call for a completely different language to be used. Perhaps it makes sense to write code in one language, but to write the tests against that code in another. Testing in another language is often a good way to extend the audience of code written to an otherwise smaller audience.

This also brings up the impact of language on developer culture. Code written in a language which is only understood by one group within a team can lead to a closed culture. Think of those developers who almost have a visceral reaction to coding in anything but their favorite language. This language is usually that of a closed culture. Entire myths can be constructed about the superiority of both the language and those who code in it. Ultimately, this can threaten to alienate the developers from others on their team (testers, BA's, designers) as well as the development community at large. Uncle Bob Martin touched upon this in his 2009 talk "What killed Smalltalk could kill Ruby too".

Then there is the case where the language is chosen for you. The experience in this scenario can range from mind-expanding to dehumanizing. Surely the Junior Developer can learn much from being made to code in another language, particularly if that language has been chosen by a group of seasoned veterans across an appropriate section of the team. In this case the developer may be introduced into whole new ways of communicating and problem solving. On the other hand, there is the case where the language is imposed by a corporate decision made years earlier which may have little bearing upon the way your team works today or the problems it is being asked to solve now. This is the fallacy of simply 'buying' a language as a tool among equal tools. When that happens, the choice can become less about the effectiveness of the 'tool' and more about the packaging/marketing (e.g. - IDE's, plugins, reporting tools, support plans, name recognition, third-party-apis and frameworks, etc.). In these cases, the developers may often feel as if they are being forced to use an inferior tool for what seems like an arbitrary reason.

Finding pragmatic ways to introduce other languages where appropriate can mitigate some of the negative cultural impacts and keep communication flowing. It also can make the developers 'smarter' by making them look at and 'speak' about the problem using different paradigms than they are used to. This is much the same as people who speak multiple languages. Becoming a polyglot programmer may just make you a better designer or architect as your pool of ways of looking at a problem will be larger.

What other ways does the language you code in impact your team or say something about you?

Thursday, September 5, 2013

Clade Taxonomy with Solr 4.4 without getting burnt

I have recently had the opportunity to do some work with the Apache Solr search platform. It is a phenomenal tool and I am still discovering much of its power. I had the need to apply taxonomy to documents and essentially search using the taxonomy. Now, I understand that via the underlying Lucene search engine the newest versions of Solr have faceted searching and taxonomy support available, but there is also a nice open source tool called Clade which looked like it would fit the bill for working out some proof of concept ideas I had. The current version of Clade only supports Solr 3.6 out of the box. For my solution however, I was interested in utilizing the multicore support of the latest Solr version. Unfortunately, in addition to multicore support, there have been a number of other changes since Solr3.6. An added benefit is that the Solr UI has improved substantially since 3.6 so digging into the created index is now easier. Clade is also written largely in python, a language which I dive into only every other blue moon. This all led to bit of digging and as I could not find any documentation on how to get Clade working with the latest Solr version, here are the steps I took:

  1. Updated the schema config so that the uniqueKey definition for doc_id included the multiValued="false" attribute 
  2. Changed the solr_url in the to conform to the multicore standard (e.g.: solr_url = ‘http://localhost:8983/solr/collection1/’ for the default example install of solr) 
  3. Similarly, when copying the conf files over, you have to make sure that you put them into the proper cores folder… (i.e. solr/collection1/conf NOT solr/conf) 
  4. Replaced the included sunburnt with the latest sunburnt (just named the included sunburnt folder sunburntOLD and copied the new sunburnt folder in) I did this just for good measure as so much of Solr had changed and I wanted to make sure that the sunburnt I was using to interface with it was a current as possible 
  5. In the Clade lib/ the sunburnt query in get_docs_for_category is being executed with the fields limited to just score=true… this causes a KeyError as doc_id and title are not returned. There is a note in the code to “FIX” it… this may have worked against the old versions of Solr and sunburnt as perhaps that limiting was not working properly in those versions, however, now you need to change the line to: results = query.field_limit(["doc_id", "title"], score=True).paginate(rows=10).execute() (see: the Sunburnt Docs
  6. That brings back all three expected fields to create the desired tuple for return.

Doing the above got Clade to work the same with Solr4.4 on my Mac as I had got it working with Solr 3.6 on my Linux machine. 

One other thing I have noted with Clade: re-running the classify script on the same data appears to create dupes in the index. This is not the expected indexing behavior so I will have to look a little closer as to what Clade is doing in that script. Fortunately, just blowing away the solr data directory clears it out - but that feels a bit heavy handed.

I hope this helps someone. Now to start wiring up my Solr instance via Spring Data...