Case sensitivity

From TheAlmightyGuru
Jump to: navigation, search
Comparing text with and without case sensitivity.

Case sensitivity describes whether a computer will treat upper and lower case letters as equal. Those systems which view them as different are "case-sensitive" while those that view them the same are "case-insensitive." A case-sensitive system would treat the words "she," "She," and "SHE" differently, while a case-insensitive system would treat them as equivalent.

Case-insensitive systems are affected by case preservation which can either be "case-preserving" or "non-case-preserving." Non-case-preserving means that all entered text will be converted to specific case used by the system. For example, the FAT16 file system is non-case-preserving, so, if a user names a file "She.txt," the file system will convert it to "SHE.TXT," losing the provided case. However, the NTFS file system is case-insensitive, but case-preserving, so, a file named "She.txt" would be stored as "She.txt," and would still be found under a search for "SHE.TXT" or "she.txt".

In computers, case sensitivity applies primarily to text comparison, file systems, and programming languages.

Personal

I grew up with MS-DOS using the FAT16 file system and programming in various forms of the BASIC programming language, both of which are case-insensitive. I didn't encounter case-sensitivity until I first tried programming in C in my teens, and found it quite off-putting. In my 20s, when I switched my web server from a Windows host to a Linux host, I discovered my site had scores of broken links because the Linux server used a case-sensitive file system, which further frustrated me. As I write more web-based code in my work, I use case-sensitive languages and operating systems more frequently now, but I find them to be unnecessarily relics that only serve to make computing harder than it needs to be. The only time I can see case sensitivity being a benefit is during a few forms of text comparison, otherwise, I prefer case-insensitive case-preserving system for everything.

Case-Sensitivity in Text Comparison

Text comparison can see some benefit from case sensitivity for things like password checks, but most forms of text comparison are case-insensitive because the disadvantages typically outweigh the advantages. Because of this, most text comparisons are case-insensitive by default, and, when they are case-sensitive, it is made clear to the user.

Advantages

  • In languages which use capitalization to distinguish words (like English capitalizing proper nouns but not common nouns), it adds to the precision of a comparison or search which may give more accurate results. For example, if a person is searching for the novel "Dune" but not interested in a sand "dune."
  • It helps exclude search results with poorly formatted text. If "DUne" occurs in some text, it's a good indication the document isn't very scholarly, so people probably won't want to see it.
  • There are some edge cases where the case can't be converted automatically. For example, the lowercase German letter "ß" is sometimes converted into uppercase as "SS," but other times just "S." The German word for street, "Straße," becomes "STRASSE," but the word for white, "weiß," becomes "WEIS." To help alleviate this problem, the German alphabet added "ẞ" as an uppercase "ß" (which creates a new problem since they're so visually similar), but it only did so in 2017, and it's still not fully adopted, so, case sensitivity still helps in these instances.

Disadvantages

  • The vast majority of humans interpret differing case as equal by default. For example, if a human searched a database of cars looking for an interior color of "Dune brown," they would expect cars to be included in the results even if they were labeled as "dune brown" or "DUNE BROWN." It's ridiculous to expect a user to correctly guess the proper case of the color, so most search engines ignore case entirely. However, some allow case-sensitive searching, but only if the user enters uppercase; that is, if the user searches for "dune" it will match "dune," "Dune," and "DUNE," but, if the user enters "Dune," it will only match "Dune." However, since this differs across different implementations, there will always be confusion as to how it works.
  • Western languages capitalize the first word in a sentence which creates false-positives in case-sensitive searches. For example, a document which includes the sentence, "Dune sand is coarse," will be included in the results for a search of the novel "Dune." If the search engine tries to correct for this by ignoring the capitalization of the first word in a sentence, it will create a false-negative with the sentence, "Dune is my favorite novel."
  • It's common to accidentally hold down the shift key too long and type "DUne." If this occurs, the only way to find this version of the word is to explicitly search for the typo.

Case-Sensitivity in File Systems

Typically, file systems designed for the home user are case-insensitive (Windows, Macintosh), while those for enterprise use are case-sensitive (Unix, Linux).

Advantages

  • File and directory names have more characters to choose from. Many old file systems only supported short file names (a length of 9 to 11 characters was common) with only a few character variations (often only letters, numbers, and some punctuation). However, modern file systems typically support long files names that can be made from hundreds of thousands of Unicode characters, so having 26 extra uppercase letters to choose from is trivial.
  • Your file names can more accurately match spelling. If you have a file about sand dunes and one about the novel, you can have "Dune" and "dune" in the same area without having to add a qualifier like "Dune - novel."
  • On extremely old platforms, case-sensitive file systems are slightly faster. On modern systems, the speed is negligible.

Disadvantages

  • As with text comparison, most humans view "Dune" and "dune" as matching. By default, they will also assume files with similar names are the same. Trying to train this out of someone requires a lot of repetition and lots of accidents before it becomes second nature.
  • Describing a case-sensitive file path is confusing. If a co-worker tells you verbally to open the the "log" directory and open the "clock" file, and you type "log\clock," you'd be annoyed when it doesn't work because the actual values are spelled "LOG" and "Clock." This will lead to wasted time needing to always clarifying the case. Even users who use case-sensitive file systems often forget to include case in when describing files or directories to other people, and nobody does this consistently all the time.
  • File maintenance is confusing. If you're told to open the "read me" file and the directory contains "ReadMe," "README," "readme," "Readme," and "readMe," you have to ask for clarification which can't easily be conveyed. Since modern file systems allow for long file names, it's much less confusing to just use longer file names.
  • Searching for files is more complicated. All of the disadvantages for text comparison apply when trying to find files or directories. Consider looking for everything related to the program FileZilla. Did the developers name their files "FileZilla" to match the title, or did they use the more common Linux naming convention of using all-lowercase "filezilla?" Perhaps, as programmers, they used the lowerCamelCase "fileZilla?" Or maybe one annoying person used 1337 and wrote "fIlEzILlA?" There is no way to be sure without searching for all case possible variations, of which there are 65,536!
  • Alleviating the disadvantages defeats all the advantages. In order to alleviate all the problems that occur with case-sensitive file systems, many users just always name every file completely in lowercase. But, in doing so, all of the advantages of having a case-sensitive file system are lost. It even becomes worse than a case-preserving case-insensitive file system because you don't even get the benefit of using proper case names.

Case-Sensitivity in Programming Languages

Programming languages are highly varied, but, technical languages like those based on C (C++, C#, Java, JavaScript) are case-sensitive while learning languages like BASIC and Pascal are not. Some are a mix like PHP where variables are case-sensitive, but functions are case-insensitive.

Online searches for the pros and cons of case sensitivity in programming are mostly opinions on coding preferences and not actually about case sensitivity. For example, it's common to see people argue that it's easier to read code when classes start with an uppercase letter, but variables start with a lowercase letter. However, this is just as possible in a case-insensitive programming language.

Advantages

  • Forcing all programmers to use the same case promotes naming consistency.
  • It promotes additional consistency when a case convention is set by the language itself. For example, in the built-in Java code, classes use UpperCamelCase, variables use lowerCamelCase, and constants use ALLUPPERCASE.
  • You have more character options to use for a name. This isn't very compelling since short names are typically viewed as bad coding practice now that storage is much larger.
  • The compiler and IDE will run faster since it won't have to perform case conversions in memory. Again, this isn't very compelling since the speed difference on modern computers is negligible, you would only notice a minor difference on older platforms.

Disadvantages

  • There are many conflicting standards for how to apply case which creates confusion. For example, some case standards say abbreviations should be displayed as all uppercase, like "XMLHTTPRequest," others say only the first letter in the abbreviation should be uppercase, like "XmlHttpRequest," while others say the first letter should always be lower, even if it's an abbreviation, like "xmlHttpRequest." This is different across languages, but there is often disagreement within a language, sometimes even in a single name, like in Java where a class is named, "XMLHttpRequest." "XML" is all upper case, but "Http" is not! Compound words also cause confusion. Should you use "FileName" or "Filename?" A stickler for the English language would use "FileName" since most dictionaries do not recognize "filename" as a word, but a techie would probably use "Filename" since it's an accepted compound word in computer parlance. Every time a programmer learns a new language or works on a different program, they have to learn a new set of contradictory, and often internally inconsistent, rules. A case-insensitive language alleviates all these problems.
  • Using case alone to distinguish between classes, functions, and variables sometimes creates unexpected problems. In a case-sensitive language, if you have a class named "Object," and accidentally write it out as "object," the compiler will probably warn you right away that "object" doesn't exist. However, if you have a class named "Object," but also a variable named "object," and you mistype one for the other in a command that works either way, you will probably get very unexpected results which could take a long time to debug. In a strong-typed case-insensitive language, the compiler will warn you ahead of time that the name is already in use.

Links

Link-Wikipedia.png