sanename sensible naming conventions for projects & packages


sanename is a set of for what to name packages when writing and delivering library code and APIs for others to use.

First the format for package naming is defined and then translation rules are defined for representation in 4 other common formatting conventions.
The package names are the name by which package managers refer to the library. i.e. the name apt-get install or npm install will use.


Examples

This is probably all you need to know.

Good

sanename
nodejs
myproj-util

Bad

my_1st_project
ab
1-coolproj_4

Ugly

YesMate
No_sir

Rules

  • sanenames MUST be composed of one or more words.
  • Multiple English words MAY be concatenated to create a proper noun, hence sanename.
  • Multiple words SHALL be delimited by -
  • Acronyms SHALL be treated as words.
  • Only lower case characters between a-z and numbers 0-9 SHALL be used.
  • Words MUST start with a letter, not a number, number suffixes MUST form part of a word.
  • sanenames MUST be between 3 and 32 characters in length.

Recommendations

  • A single composite word is RECOMMENDED.
  • English with US speling is the RECOMMENDED language.
  • When there is a lack of an existing formatting convention the package name format is RECOMMENDED.


Rules & Recommendations detail

Most package names already are sanenames, the rules are common sense. However often project names have artifacts from the languages they were originally written in, when the project usage expands and needs to be included in other contexts this can cause issues, such as needing to modify the name. The sanename rules attempt to avoid such issues.

sanenames MUST be composed of one or more words.

The aim of a sanename is to produce a proper noun the uniquely defines the package but also is concisely descriptive of what the package does. This is not an easy goal. Makes some attempt to ensure your package is unique in the realm is lives, if its a Node.js package check npm, what ever it is google it first and try to avoid overloaded names.

A single composite word is RECOMMENDED.

Simple names are good particularly if they are proper nouns, avoid single words that are typically used English words since this leads to confusion. There is a project on sourceforge called this which is not an inspired name in this regard.

Multiple English words MAY be concatenated to create a proper noun, hence sanename.

Concatenating names to make a proper noun is the simplest way of creating project names that are descriptive: jsonbinder , logingreeter , logencrypter . If the project is assigned a composite name like this always treat it as a proper noun. e.g. sanename and sane-name are different concepts. sane-name does not exist, sanename does. This is an important part of the naming guide, remain consistent. nodejs insist on calling their project Node.JS which results in a full stop in the wrong place in every other sentence on their website.

Multiple words SHALL be delimited by -

Concatenating more than two words starts to get difficult to read. Longer names should be divided into words: node-jsonbinder-utils. If a package is a sub project respect the original sanename, i.e. do not create a package node-json-binder-utils if it is based on or part of jsonbinder. Concatentating whole phrases should be avoided, only concatenate words to create proper nouns. e.g prefer antinstaller-build-tool over antinstallerbuildtool.

Acronyms SHALL be treated as words.

This part of the rules is to resolve issues with bean naming conventions in Java. It is potentially contentious. In sanename spec mp3 is a word, as is nsa , gchq and ibm. This makes little difference in lower case packagenames such as ibm-nsa-gchq-tools, however when transforming to CamelCase this name becomes IbmNsaGchqTools. This looks a bit odd at first sight because you know that IBM are initials. Its better than IBMNSAGCHQTools This rule is important since it enables code to determine the correct package sanename from the classname and vice versa.

English with US speling is the RECOMMENDED language.

Most code is written in US English. Deal with it. I'm British, its hard, but its the defacto standard. Check your spelling in an online US dictionary if there is any doubt. Blatantly incorrect speling is encouraged since it helps generate uniqueness: tumblr, speling (if you did not recognise it, it is from the apache module of that name), prefered is the famous spelling mistake in HTTP, avoid that. Single nouns from any other language make good sanenames: ubuntu, paris, simba, equisemel. Avoid non-english in the descriptive words of a name xml-parseador.

Only lower case characters between a-z and numbers 0-9 SHALL be used.

No upper case letters to be used: mixed case makes translation to and from constants and camel case complicated. Lower case is faster to type. Debian do it. Marketing men may not like this rule since often case is part of a trade mark, notwistanding the rule applies. A package name with an upper case letter is not a sanename. Whitespace is used by almost all languages to delimit tokens, that should be obvious. While underscore is permitted in almost all languages its excluded from sanename. N.B. unlike Debian packages + is not permitted, limiting sanenames to simple latin characters simplifies validation and removes the need for any escaping in almost all string representations. + would create issues in URIs.

Words MUST start with a letter, not a number, number suffixes MUST form part of a word.

This rule is borrowed from C code requirements, that have found there way into most other languages. Its not stricty necessary but, by enforcing it in sanename, there are a whole lot more use cases added where no translation or munging of the project name is required. Most importantly variable names in code where digits define numbers. Complete rewrites of a code base are often given a new package name, this can be done by attaching digit as a suffix. I think this is compatible with semver. e.g. apache2, junit4, there are technical reasons projects are repackaged like this, primarily so both versions can coexist at runtime. sanename specifies that such suffixes should be part of the word boundry, i.e. apache-2 is invalid. Conceptually apache2 is a spearate proper noun and a separate entity from apache. Since dots are forbidden the temptation to name a project with a semver number is averted, web2.0 is not a valid sanename, use web2 .

sanenames MUST be between 3 and 32 characters in length.

Avoid 3 letter names, they are more likly to clash. Ultimatly pathnames will hit OS limits, its 255 in DNS names and Windows, it can be hit quite easily. Packagenames are often used as part of paths, e.g. repository URLs, its important to keep the name short. Package names should be over 4 characters to help ensure they are unique. This does not mean binary file names should be in this range. Its acceptable for a project apache-bench to have the command ab while ab is an invalid sanename.

When there is a lack of an existing formatting convention the package name format is RECOMMENDED.

Try to avoid formatting conventions such as MY_PROJECT and MyProject, these have explicit meaning in code, i.e. typically constant and class.

Benefits

The four formatting conventions considered as as follows.

  1. Class names, aka CamelCase convention - This is where multiple words are concatenated without spaces and the first letter of each word is capitalised, typically used for class names in object oriented programing languages.
  2. CamelCase instance names - This is camelCase with a lower case first letter, typically used for instances of classes.
  3. Constant names, env variable names - ALL_UPPER_CASE_WORDS separated by underscore, typically used for constants.
  4. Kernel variable names - all_lower_case_words separated by underscore, as used by C code that follows Linux kernel code conventions.

sanename guidelines make it possible to transform across all of the above usecases deterministically. That is to say given a sanename in any one of these formats its possible to determine exactly the name that will be used in the other formats. There are many other naming conventions which sanename does not cater for, for example the convention for Linux libraries "lib" plus the library name in all lowercase without any word boundries. Tan pis.

Other beneftis

  • By not using utf-8 chars, sort order is the same in all cases.
  • Validation, and regexp are simple.
  • XSS risks or code injection dont arise due to strong validation.
  • C string byte length is the same no matter the encoding.
  • It works in a URL and emails addresses.
  • Its faster to type than mixed case.
  • It needs no escaping for file names or directory names.
  • It needs no escaping in shell scripts.
  • By being very strict its compatible with most important naming conventions
  • You can still find all the keys if the OS swaps your Turkish keyboard to US layout.
  • If someone comes up with the cool name web3.0 for the next project in your company and is defensive about the name you can point them to this page and save yourself a bunch of cosmetic Jira issues later down the line.
  • Ad nauseam...

Negatives

Whats the downside to this?
  • A few cool names like PuTTY become boring.
  • Some acronmys look a bit wierd in Java Classes.
Feel free to flame me if you can think of anything else.

Existing Projects

A great many software projects already use sanenames because they are, well, sane.

Its common to concatenate 2 english words that creates a proper noun for the project and explains a bit about it.
github
sourceforge
nginx
apache
semver

Acronyms, backronmys, nacronyms and random collections of letters can be used as a word to great effect. e.g.
irc - internet relay chat.
twain - technology without an interesting name.
npm - which does not stand for node package manager.
java - just another vague acronym.*


Format Usage Recomendations

Use the package name format in any doubt, avoid capitalizing the first letter. In software case matters.

  • File names - Use the package name.
  • Directory names - Use the package name, avoid using kernel names for directories, even if its tempting.
  • Technical documentation - Use the package name, treat the package name as a proper noun
  • Non-Technical documentation - Use the class name, this is more familar to non-techies. Dont fiddle with the case rules or add whitespace since this will confuse the techies.
  • DNS names - you dont always get to chose the DNS name you really want, but notice that package name formatting is supported by DNS.
  • Maven artifact ids & project name - Use the package name, maven accepts other characters but these are annoying.


License

This document is copyright 2105 Paul Hinds. Permission is granted to copy this text freely provided none of the content is changed. Translations of the document must be authorised. Quoting a big part of the doc is OK provided you link back. Obvisoulsy you can call your project what you like, but please don't call it a sanename if its not. The idea of this license is to prevent adding rules and causing confusion.


* OK, I made that up.
ยง