LF, CR, CRLF – Pressing Enter

Story (Skip it if you don’t have time)

I had a tough time once when i was processing text. I usually work on linux machines. I got this text file that was made on a windows machine and I was going to process it. The process involved adding a few characters at the start of every line. To do this. The usual thing to do is to replace ‘\n’ with ‘\nabc’ assuming abc are those characters to be placed at the beginning of every line. ┬áThis just did not work on this particular file that i received. It was only after i used a tool that displays every single character on the file that i realized that text made on windows machines add something called a carriage return ‘\r’, along with the line feed character ‘\n’.┬áThus the need for a post.

TL;DR Start reading from here..

So all this goes long back to the age of the typewriter. LF stands for Line Feed. It would just move the paper up by a line space and keep the same horizontal position. CR on the other hand stands for Carriage Return and moves to the start of the line on the same line.

So intuitively it seems that the windows people have got this right. Because CR and LF would move the pointer to the start of the next line. But is it necessary in the current age? That is another question. On linux, only the line feed character is used. On Mac, only the carriage return character is used.

The ASCII codes are given below,
CR = \r = ASCII code 13
LF = \n = ASCII code 10
CRLF = \r\n = ASCII code 13 and then ASCII code 10

Keep this in mind when you are processing text files originating from these operating systems.

Thanks for reading.

 

Temp Files – Bash Scripting

If you use bash scripts on a regular basis for file manipulations you would have come across times where the output had to be written to some temporary files. From which further handling can be done. This is quite common. But surprisingly after years, I have come to know the use of temporary files that the system itself safely creates in the /tmp directory. You do not have to worry about the naming or the location or such. You can use the mktemp command to create a temporary file which can be assigned to a file handler.

Lets look at a very common use. You have some text ready after some processing. You want to write it to a file. What you do before hand is say,
temp_file=$(mktemp)

This will create a tempfile and assign it to the file handler temp_file. Then you will probably want to write some stuff to this file. you can do this by just sending the outputs like this,
echo "my output" > $temp_file

Just using echo here for explanation purposes. You can continue appending your outputs of course, using >>.

You can use this file for further processing in a for loop or something just like you would do any other file. Like for example,

for line in `cat $temp_file`; do

This was one really useful tip because I do not have to worry about the file naming or location of the file or about removing the file later on.

Thanks for reading.

PDF Page Edits – Linux CLI

PDF page edits can be pretty simple on Linux with these two command line applications that should ship with most distributions. I have it on my fedora. I’m talking about pdfseperate and pdfunite. To find out if you already have the application installed, head over to your terminal, type in pdf and hit tab, that should trigger the auto-complete for applications installed, you will see the applications that start with the name pdf, like pdftops, pdfcrop and so on. If you see these two in the list, then you are good to go. I would recommend this on small pdf files but for huge ones i would recommend something else, mainly because of the way pdfseperate operates. But then, it would still work on large files as well. You will understand why in a bit.

pdfseperate, splits all your pdf files into single pages and outputs them in a place you specify and with a name series you specify. Take a look at this example.

pdfseparate summary.pdf summarypages_%d.pdf.

What this does is to split the pdf named summary.pdf into a series of files with the name ‘summarypages’ followed by a number series. so for example if there were three pages in the summary.pdf. You will get the following files in the specified location. summarypages_1.pdf, summarypages_2.pdf, summarypages_3.pdf. It is better to make these files in a folder so that you do not crowd the existing folder, in case you have many other files on it.

Once you have this, you can unite the pages you want, in the order you want using pdfunite. For example, do something like this

pdfunite summarypages_1.pdf summarypages_3.pdf selectsummary.pdf

to unite pages 1 and 3 to a new pdf named selectsummary.pdf. But yes of course this is cumbersome, you can shorten this by saying something like

pdfunite summarypages_{1,3}.pdf selectsummary.pdf.

This also does the same thing. But now you have something short. If you have a big list, then you would probably have to enter them in the list above, like say summarypages_{1,2,5,6,9,11,14}.pdf. This is a bash functionality and does not apply only to this. You can use it anywhere on the terminal.

You now understand why i said this would not be suitable for large pdf files. Mainly because pdfseperate splits all the files into individual files. For a thousand page document you would have 1000 pages. Yes, writing it to a different folder will actually KIND OF solve the clutter. You cannot blame the pdf unite setup though. If you want to unite select pages you would anyway have to select those pages, listing them in the command above or on a seperate text file and writing a script would solve the problem. Remove the split pdf files later when you are done.

Thanks for reading.

xclip – some good uses

xclip is a tool to copy whole terminal outputs as well as from text files, to be pasted elsewhere. In short, sending text outputs to the clipboard. I needed this because, sometimes, copying outputs from terminal can be a hassle. One for the fact that clicking and dragging can be a pain with all the text going on, and also because, ctrl + shift + c is used for copying rather than ctrl + c, which is to cancel the current process running in the terminal, which can end up really bad sometimes.

Anyways, listed below are some good uses of xclip.

  1. ls -a | xclip or any command that outputs text on the terminal like cat, would send the text output to the clipboard.
  2. xclip -o would output the text on the clipboard. The last text that was copied
  3. xclip path/to/textfile would send the contents of the file to clipboard for pasting it elsewhere.
  4. You could also send the contents of the clipboard to a file by saying xclip -o > somefile.txt. This is ofcourse obvious for any command outputting text.

Markdown + Pandoc = Note taking and more

Pandoc is one of my recent findings when i was looking for a good way to take notes when im studying. I have always liked latex and the ease by which you can write equations with it. The lack of worry when it comes to alignment is second to none. So I did try using latex for note taking. But it seemed to take away a lot of my concentration from the actual note taking to the details of latex syntax, I blame myself for that. Anyways, i found pandoc when i was going through stack exchange conversations and it stuck. It has this really nice way of converting from this format called Markdown to formats such as html and latex. Markdown is really simple. I love the syntax. If you are a linux user you might have seen readme files come with this file type named, md. That is Markdown.

The main things that are required for note taking according to me are,

  1. A good/easy way to write down points. Ordered and unordered. With different level alignments.
  2. An easy way to write different levels of headers.
  3. An easy way to write equations.
  4. An easy way to embed images.

Markdown had all this and an easy way to convert to latex format. That is if you want to edit things on latex. Else it could directly convert it to pdf as well.

Could be very useful if you are making web content and you want to create html content. I did this when i was working on a website recently where the content had a lot of points to list down. I wrote all my content in markdown, then converted it to html using pandoc and pasted it on my original html file which had all the other header stuff.

I do not want to repeat markdown syntax fundamentals here. You can get it pretty easily anywhere on google. These two sites look good. Link 1. Link 2. Take a look at them. Once you are done with all this. cd to the directory where you have the md file then type this.

pandoc source.md -o destination.pdf.

This would convert a source file in markdown format to a pdf. You could replace the pdf extention with html to get an html instead.

For more info on pandoc. Installation and everything go to pandoc.org. Installation should be pretty simple with your package manager. For fedora do, dnf install pandoc. With superuser rights ofcourse.