Story (Skip it if you don’t have time)
I had a tough time once when i was processing text. I usually work on linux machines. I got this text file that was made on a windows machine and I was going to process it. The process involved adding a few characters at the start of every line. To do this. The usual thing to do is to replace ‘\n’ with ‘\nabc’ assuming abc are those characters to be placed at the beginning of every line. This just did not work on this particular file that i received. It was only after i used a tool that displays every single character on the file that i realized that text made on windows machines add something called a carriage return ‘\r’, along with the line feed character ‘\n’. Thus the need for a post.
TL;DR Start reading from here..
So all this goes long back to the age of the typewriter. LF stands for Line Feed. It would just move the paper up by a line space and keep the same horizontal position. CR on the other hand stands for Carriage Return and moves to the start of the line on the same line.
So intuitively it seems that the windows people have got this right. Because CR and LF would move the pointer to the start of the next line. But is it necessary in the current age? That is another question. On linux, only the line feed character is used. On Mac, only the carriage return character is used.
The ASCII codes are given below,
CR = \r = ASCII code 13
LF = \n = ASCII code 10
CRLF = \r\n = ASCII code 13 and then ASCII code 10
Keep this in mind when you are processing text files originating from these operating systems.
Thanks for reading.
If you use bash scripts on a regular basis for file manipulations you would have come across times where the output had to be written to some temporary files. From which further handling can be done. This is quite common. But surprisingly after years, I have come to know the use of temporary files that the system itself safely creates in the /tmp directory. You do not have to worry about the naming or the location or such. You can use the mktemp command to create a temporary file which can be assigned to a file handler.
Lets look at a very common use. You have some text ready after some processing. You want to write it to a file. What you do before hand is say,
This will create a tempfile and assign it to the file handler temp_file. Then you will probably want to write some stuff to this file. you can do this by just sending the outputs like this,
echo "my output" > $temp_file
Just using echo here for explanation purposes. You can continue appending your outputs of course, using >>.
You can use this file for further processing in a for loop or something just like you would do any other file. Like for example,
for line in `cat $temp_file`; do
This was one really useful tip because I do not have to worry about the file naming or location of the file or about removing the file later on.
Thanks for reading.
xclip is a tool to copy whole terminal outputs as well as from text files, to be pasted elsewhere. In short, sending text outputs to the clipboard. I needed this because, sometimes, copying outputs from terminal can be a hassle. One for the fact that clicking and dragging can be a pain with all the text going on, and also because, ctrl + shift + c is used for copying rather than ctrl + c, which is to cancel the current process running in the terminal, which can end up really bad sometimes.
Anyways, listed below are some good uses of xclip.
ls -a | xclip or any command that outputs text on the terminal like cat, would send the text output to the clipboard.
xclip -o would output the text on the clipboard. The last text that was copied
xclip path/to/textfile would send the contents of the file to clipboard for pasting it elsewhere.
- You could also send the contents of the clipboard to a file by saying
xclip -o > somefile.txt. This is ofcourse obvious for any command outputting text.
Pandoc is one of my recent findings when i was looking for a good way to take notes when im studying. I have always liked latex and the ease by which you can write equations with it. The lack of worry when it comes to alignment is second to none. So I did try using latex for note taking. But it seemed to take away a lot of my concentration from the actual note taking to the details of latex syntax, I blame myself for that. Anyways, i found pandoc when i was going through stack exchange conversations and it stuck. It has this really nice way of converting from this format called Markdown to formats such as html and latex. Markdown is really simple. I love the syntax. If you are a linux user you might have seen readme files come with this file type named, md. That is Markdown.
The main things that are required for note taking according to me are,
- A good/easy way to write down points. Ordered and unordered. With different level alignments.
- An easy way to write different levels of headers.
- An easy way to write equations.
- An easy way to embed images.
Markdown had all this and an easy way to convert to latex format. That is if you want to edit things on latex. Else it could directly convert it to pdf as well.
Could be very useful if you are making web content and you want to create html content. I did this when i was working on a website recently where the content had a lot of points to list down. I wrote all my content in markdown, then converted it to html using pandoc and pasted it on my original html file which had all the other header stuff.
I do not want to repeat markdown syntax fundamentals here. You can get it pretty easily anywhere on google. These two sites look good. Link 1. Link 2. Take a look at them. Once you are done with all this. cd to the directory where you have the md file then type this.
pandoc source.md -o destination.pdf.
This would convert a source file in markdown format to a pdf. You could replace the pdf extention with html to get an html instead.
For more info on pandoc. Installation and everything go to pandoc.org. Installation should be pretty simple with your package manager. For fedora do,
dnf install pandoc. With superuser rights ofcourse.
Amplifying mp3 files is a piece of cake with lame. All you have to do is,
lame –scale <scaling factor> <infile> <outfile>
And it works wonderfully.
tr – Translate of Delete. This is the first time i am using this command and it was pretty nice. Used the gui repalce functionality on text editors so far and, sed recently. But this seems pretty nice.
tr 'abc' 'xyz' will replace the occurence of abc with xyz. Takes input from the standard input. Meaning if you run this command and type something in and press enter. You will see the results down below. Common use is to cat and pipeline the content though.
Another personal reference
Two basic forms of use.
sed [-n] [-e] 'command(s)' files
sed [-n] -f scriptfile files
First one uses inline commands. The second one uses a script file. A command like delete or print can be executed based on an address/line range or a pattern. Lets look at the commands themselves in no particular order just to recollect things. All concepts are mixed and mashed.
p stands for print and it prints the lines that are specified. ‘1p’ will print the first line.
'1,5 p' will print the lines from one to five.
'2, +3 p' will print line 2 and the next 3 lines as well.
'2, ~4 p' will print line 2 and every fourth line from that. Meaning, 2, 6, 10, 14, 18, 22…
- Address and Pattern matching can be used together.
'/hello/, 5 p' looks for the line that has ‘hello’ and then prints five lines that follow it.
'/hello/, $ p' prints from the line that has the first occurence of hello till the end
'/pattern1/ /pattern2/ p' prints all the lines that contains pattern 1 upto the line that contains pattern 2.
d is for deleting a line. Similar to p, for printing.
w can be used to write to a file after filtering some lines. Or with no filters at all, which would make it work like a cp command.
'w newfile' will make a copy of the target file into the new file
'2,5 w newfile' will just write the lines 2 to 5 of the target file to the new file. Pattern matching can also be used instead of Address matching.
a can be used to append a line to a file.
'4 a text to be appended' appends the text to be appended to the file after line 4.
c can be used for replacing or changing patterns or lines.
'4 c text line to be replaced' will replace the fourth line with the text line to be replaced. Multiple lines can be replaced with single lines as well.
i stands for insert.
y for translate.
- Syntax for y
[address1[,address2]]l shows hidden characters
[address]q [value] for quit
[address]r file read and insert contents of file into address of the target file specified
[address1[,address2]]e [command] execute bash command on the address
[address1[,address2]]s/pattern/replacement/[flags] substitute command
's/Paulo Coelho/PAULO COELHO/w junk.txt' replace and write to new file
More to be added to this post.