[BlindMath] Turning math_scanner into a text layout checker

Fri Sep 1 03:16:26 UTC 2023

Hello everyone,

so, I've been working on a document where typesetting was quite 
important. I've had a special layout in mind I wanted to achieve 
including spatial text blocks, special alignments, images, etc.

Typst makes it very easy to design all sorts of imaginable things, but 
even if the syntax is as great as it is, making a mistake, 
misunderstanding something or simply not being sure how well will given 
content arrange, I was in a need to check out how did my idea turn out.

So, I implemented few useful layout checking functions into 
math_scanner, since it has already showed interesting results when it 
comes to working with documents' graphics.

So, right now, after Tesseract processes the input image, you can:

* for any character, check the percentual distance to the left, right, 
top and bottom edge of the image

* If you border a region of the image, you can check its percentual size 
relative to the page (or column if you've made use of the program's 
columnns splitting functionality)

* Check the size of any focused character in pixels. Note this may not 
be always accurate, since the size is calculated from the bounding boxes 
determined dby Tesseract.

Making use of these functions, you can easily check say whether a 
heading is centered, how much vertical space is left in column while 
writing the text, whether and how paragraphs are aligned, how big are 
individual text blocks on the paper, or whether your figures were 
aligned correctly as far as there is text around them you could use to 
mark a region (note just horizontal / vertical borders are necessary for 
determining the height / width, respectively).

math_scanner can split the input image into columns, which are 
afterwards treated like standalone images (including rerecognition by 
Tesseract, this can clear out a lot of clutter).

The new layout checking functions respect this mechanism too, so if you 
have a multi-column document, you can review the layout in each of them 
separately.

Indeed, it's still a good idea to have your work checked dby a sighted 
reviewer, but it's still a difference to call someone for a check 5 
times and 50 times, because you don't expect something, then change your 
mind, rework, etc.

This particular implementation also has its limitations, namely the fact 
it's run by OCR has few advantages, but also some significant downsides, 
like recognition errors and the general unavareness of the program about 
things like figures in the document.

It would be very interesting to implement something similar working 
directly with information from PDF, since tools like Typst or LaTeX tend 
to include them in somewhat semantic form, so it may be possible to get 
very interesting results.

Right now however I don't think I quite have the time to study the 
structure of PDF documents nor build a layout explorer from scratch, so 
since math_scanner already had most of the prerequisites which were good 
enough for my use-case, this is the optimal route for me at the time being.

I'm just letting people know in case someone was interested in my little 
experiment.

You can find the new commits in the development branch of math_scanner:

https://github.com/RastislavKish/math_scanner

Still Linux only, the Windows branch is actually complete and 
functional, but it yet has to be merged, since I didn't find anybody on 
Windows willing/able to install Python and the necessary dependencies 
for my little program. :)

Perhaps I should merge it though, the changes are subtle really, there 
is very little to go wrong and even if it does, it can be solved when 
someone notices.

I will take a look into it at some point.

Have fun

Best regards

Rastislav