sanename sensible naming conventions for projects & packages


sanename is a set of rules for naming software.

First the format for package naming is defined and then translation rules are defined for representation in 4 other common formatting conventions.
The package names are the name by which package managers refer to a library. i.e. the name apt-get install or npm install uses.
Sanename naming restrictions can be used for other software elements such as table names, file names, hash key rules, allowing for early, strict, validation and easy communication about the naming rules.


Examples

Good

sanename
nodejs
myproj-util

Bad

my_1st_project
ab
1-coolproj_4

Ugly

YesMate
No_sir

Rules

  • sanenames MUST be composed of one or more words.
  • Multiple English words MAY be concatenated to create a proper noun, hence sanename.
  • Multiple words SHALL be delimited by -
  • Acronyms SHALL be treated as words.
  • Only lower case characters between a-z and numbers 0-9 SHALL be used.
  • Words MUST start with a letter, not a number, number suffixes MUST form part of a word.
  • sanenames MUST be between 3 and 32 characters in length.

Recommendations

  • A single composite word is RECOMMENDED.
  • English with US spelling is the RECOMMENDED language.
  • When there is a lack of an existing convention the package name format is RECOMMENDED.


Rules & Recommendations detail

Most package names already are sanenames, the rules are common sense. However often project names have artifacts from the languages they were originally written in, when the project usage expands and needs to be included in other contexts this can cause issues, such as needing to modify the name or escape characters. The sanename rules attempt to avoid such issues.

sanenames MUST be composed of one or more words.

The aim of a sanename is to produce a proper noun that uniquely defines the package, is descriptive and concise. Package names should be unique in the realm in which they live, Node.js packages should be unique withing npm, an attempt to avoid overlaoded names from other areas is recommended.

A single composite word is RECOMMENDED.

Simple names are good, proper nouns naturally aid in use within natural language conventions, for example fluent apis, avoid single English words since this leads to confusion when a convention prevents capitalization to mark the proper noun.

Multiple English words MAY be concatenated to create a proper noun, hence sanename.

Concatenating names to make a proper noun is a simple way of creating descriptive unique names (for easy searching): jsonbinder , logingreeter , logencrypter . If a project is assigned a composite name like this it should be treated as a proper noun. e.g. sanename and sane-name may not be grouped in text searches. This is an important part of the naming guide, remain consistent. nodejs project styles the name Node.JS which results in a full stop in the wrong place in sentences, this complicates delimiting sentences that contain the project name with code, e.g. automatically generating summaries from a larger body of text.

Multiple words SHALL be delimited by -

Concatenating more than two words starts to get difficult to read. Longer names should be divided into words: node-jsonbinder-utils. If a package is a sub-project respect the original sanename, i.e. do not create a package node-json-binder-utils if it is based on or part of jsonbinder. Concatentating whole phrases should be avoided, only concatenate words to create proper nouns. e.g prefer antinstaller-build-tool over antinstaller-buildtool.

Acronyms SHALL be treated as words.

This part of the rules is to resolve issues mapping to CamelCase. In sanename spec mp3 is a word, as is nsa , gchq and ibm. This makes little difference in lower case packagenames such as ibm-nsa-gchq-tools, however when transforming to CamelCase this name becomes IbmNsaGchqTools. This looks a bit odd at first sight because you know that IBM are initials. Its better than IBMNSAGCHQTools because the transformationcan be aplied in both directions. It enables code to determine a consistent package sanename from the classname and vice versa.

English with US speling is the RECOMMENDED language.

Most code is written in US English, and it is the defacto standard. Check your spelling in an online US dictionary if there is any doubt. Blatantly incorrect spelling is encouraged since it helps generate uniqueness: tumblr, speling. Single nouns from any other language make good sanenames: ubuntu, paris, simba, equisemel. Avoid non-english in the descriptive parts of a name xml-parseador.

Only lower case characters between a-z and numbers 0-9 SHALL be used.

No upper case letters to be used: mixed case makes translation to and from constants and camel case complicated. Lower case is faster to type. Debian do it. Marketing departments may not like this rule since case style can be part of a trade mark, notwistanding the rule applies. A package name with an upper case letter is not a sanename. Whitespace is used by almost all languages to delimit tokens, that should be obvious. While underscore is permitted in almost all languages its excluded from sanename. N.B. unlike Debian packages + is not permitted, limiting sanenames to simple latin characters simplifies validation and removes the need for any escaping in almost all string representations. + would create issues in URIs.

Words MUST start with a letter, not a number, number suffixes MUST form part of a word.

This rule is borrowed from C rules that have found there way into most other languages. Its not strictly necessary but, by enforcing it in sanename, there are more use cases where no translation of the project name is required. Most importantly variable names in code where digits define numbers. Complete rewrites of a code base are often given a new package name, this can be represented by attaching digit as a suffix. sanename numeric suffixes form part of the word boundry, so this is compatible with semver. e.g. apache2, junit4, there are technical reasons projects are repackaged like this, primarily so both versions can co-exist at runtime. sanename specifies that such suffixes should be part of the word boundry, i.e. apache-2 is invalid. Conceptually apache2 is a spearate proper noun and a separate entity from apache. Since dots are forbidden the temptation to name a project with a semver number is averted, web2.0 is not a valid sanename, use web2 .

sanenames MUST be between 3 and 32 characters in length.

Avoid 3 letter names, they are more likly to clash and likely to result in false positives in searches. Ultimatly pathnames will hit OS limits, its 255 in DNS names and Windows, it can be hit quite easily. Packagenames are often used as part of paths, e.g. repository URLs, its important to keep the name short. While very short aliases can be convenient to type, e.g apache-bench has the command ab, ab is an invalid sanename.

When there is a lack of an existing formatting convention the package name format is RECOMMENDED.

Try to avoid formatting conventions such as MY_PROJECT and MyProject, these have explicit meaning in some programming languages, i.e. typically constant and class.

Benefits

The four formatting conventions considered are as follows.

  1. Class names, aka CamelCase convention - Multiple words are concatenated without spaces and the first letter of each word is capitalised, typically used for class names in object oriented programing languages.
  2. CamelCase instance names - camelCase with a lower case first letter, typically used for instances of classes.
  3. Constant names, env variable names - ALL_UPPER_CASE_WORDS separated by underscore, typically used for constants.
  4. Kernel variable names - all_lower_case_words separated by underscore, as used by C code that follows Linux kernel code conventions.

sanename guidelines make it possible to transform across all of the above usecases deterministically. That is to say given a sanename in any one of these formats its possible to determine exactly the name that will be used in the other formats. One of these forms is generally compatibel with existing naming conventions, however there are some which sanename does not cater for, for example the convention for Linux libraries "lib" plus the library name in all lowercase without any word boundries.

Other beneftis

  • By not using utf-8 chars, most locales sort orders are the same as ascii sort order.
  • Validation, and regexp are simple.
  • XSS risks or code injection are easier to avoid due to strong validation.
  • C string byte length is the same for utf_8 and ascii.
  • It works in a URL and email addresses.
  • By being very strict its compatible with most important naming conventions
  • It can be tralsted back and forth safely without escaping from, package names, class names, structs, variables, table names field names, environment variables, file system paths, in almost all conceivable computer systems.
  • It is 7bit safe
  • Its faster to type than mixed case.
  • Path traversal bugs are easily avoided.
  • You can still find all the keys if the OS swaps your Turkish keyboard to US layout.
  • If someone comes up with the cool name web3.0 for the next project in your company and is defensive about the name you can point them to this page and save yourself a bunch of cosmetic Jira issues later down the line.

Negatives

  • A few cool names like PuTTY are less expressive.
  • Some acronmys look a bit wierd in Java Classes.
Feel free to flame me if you can think of anything else.

Existing Projects

A great many software projects already use sanenames.

Its common to concatenate 2 words to create a proper noun in English and other languages, the noun can there fore be descriptive.
github
sourceforge
nginx
apache
semver

Acronyms, backronmys, nacronyms and random collections of letters can be used as a word to great effect. e.g.
irc - internet relay chat.
twain - technology without an interesting name.
npm - which does not stand for node package manager.
java - just another vague acronym.


Format Usage Recomendations

Use the package name format in any doubt, avoid capitalizing the first letter. In software case matters.

  • File names - Use the package name.
  • Directory names - Use the package name, avoid using kernel names for directories.
  • Technical documentation - Use the package name, treat the package name as a proper noun
  • Non-Technical documentation - Use the class name, this is more familar to non-techies. Dont fiddle with the case rules or add whitespace since this will complicate the technical names.
  • DNS names - you dont always get to chose the DNS name you want, but notice that package name formatting is supported by DNS.
  • Maven artifact ids & project name - Use the package name, even tho maven accepts other characters.


License

Copyright 2105 Paul Hinds. Permission is granted to copy this text freely.


ยง