Wednesday, May 2, 2012

Different ways to print first few characters of a string in Linux



  In this article, we will see the different ways in which we can extract and print the first 3 characters of every line in a file. Assume a file with sample contents as below:
$ cat file
Linux
Unix
Solaris 
1.  This is the best of all since its purely internal.  The file is read in the file loop and the first 3 characters are retrieved using the shell.
$ while read line
> do
>  echo ${line:0:3}
> done < file
Lin
Uni
Sol
   ${x:0:3} means to extract 3 characters from position 0 from the variable x. In this way, the shell can also be used to extract sub-string from a variable. This is one of the most important features of the shell.

2. Using cut, we can cut the first 3 characters.
$ cut -c -3 file
Lin
Uni
Sol
   The same can also be achieved by having 1-3 in place of -3.

3.  grep is used to print the contents of a file matching a specific content. grep prints an entire line by default. -o option of grep allows to print only the pattern matched. The dot(.) matches a single character. By giving 3 dots, it matches 3 characters and the control(^) character makes it to match from the beginning.
$ grep -o '^...' file
Lin
Uni
Sol
4. In case of lesser characters, we can provide that many dots. What if it is some 20 characters? Giving that many dots will look clumsy. Regular expressions provide an option {n} which means the preceeding character should match n times. Here, a single character is matched using the dot(.), and the number 3 tells it to match it 3 times. The backslash(\) is to to prevent the grep from interpreting the '{' as literal.
$ grep -o '^.\{3\}' file
Lin
Uni
Sol
5. The substr function of awk does it easily. The substr syntax is: substr(string,starting position, offset). The string is the entire line which is $0. 0 is the starting position, 3 is the number of characters to extract.
$ awk '{print substr($0,0,3);}' file
Lin
Uni
Sol
6. The sed solution is almost same as the grep earlier which we did. The entire line is broken into two parts; first 3 characters and the rest. By giving \1, the first 3 characters are printed which are sub-grouped earlier.
$ sed 's/\(.\{3\}\).*/\1/' file
Lin
Uni
Sol
7. The perl option also has the substr function. $_ represents the line read in perl.
$ perl -lne 'print substr($_,0,3);' file
Lin
Uni
Sol

No comments:

Post a Comment