Category Archives: wordpresser

wordpresser (category auto created by Wordpresser)

Converting M4B to MP3 and splitting MP3 files

As I was getting some audio-books ready for listening on the road, using an old car-stereo system, I quickly found out that M4B files would not work, while MP3 content would be recognized, up to 320 kbps.

In order to convert M4B to MP3, the superb free and open source ffmpeg software, as expected, did the job at first attempt, with zero surprises: as with the vast majority of situations, all it was required was to explicitly state input and output files. I also was explicit regarding the audio coded (acodec) to be “libmp3lame”.

Here is the the full command-line (CLI) command I used:

ffmpeg -i audiobook.m4b -acodec libmp3lame audiobook.mp3

Some documentation here:
https://trac.ffmpeg.org/wiki/Encode/MP3

But another problem was on the cards: this old car stereo only recognizes FAT32 USB devices and sometimes forgets where in the MP3 file a listener stops listening. This can be very inconvenient: if the player loses tracks of where to resume, then it resumes at the absolute file start. Moreover, the MP3 player only provides file-based navigation, which means one can only go to next/previous file/track, but can NOT browse inside a specific file/track.

This, of course, is unacceptable for long MP3 audio-books, so I had to find a way to split a big original MP3 file in smaller partials, that would be browsable without much frustration, should a player memory glitch happen.

For the splitting, I used another free and open source tool: “MP3SPLT”.
More about it, here:
http://mp3splt.sourceforge.net/mp3splt_page/home.php

Here is the the full command-line (CLI) command I used to organize the book in 5 minutes chapters:

mp3splt.exe audiobook.mp3 -t 5.0

That was it, everything is ready for the road.

My yearly Outlook pattern, short version

“Outlook”, Microsoft’s email client from their “Office Suite” software, is still an important tool, for me. Despite the existence of strong alternatives, namely Mozilla’s Thunderbird, I have hacked my way around Outlook – the 2016 version, to be exact. I use it strictly as an email client, no calendar, no tasks, zero cloud integrations. It has not been trivial: some modern security features are not available; for example, to connect to Gmail accounts, one has to force “allow less secure access” on Google’s side.

For me, it is fundamental to steer away from the poisoned convenience of web mail solutions: I say no to all WWW-based email interface systems. This includes Microsoft’s own “Outlook” cloud mail and Google’s Gmail. Slowly, but surely, such systems diminish the user’s email autonomy: if one loses internet access, or wants to programmatically search decade+ old messages, one comes to an halt, if the approach is totally cloud-dependent and no local offline backups exist.

“Outlook” stores its messages in “.PST” files. What follows is my usage pattern. If you decide to try it, before proceeding, make sure you know how to configure “Outlook”, namely that you are capable of setting a custom “default data file” store for all the messages. You must also have comfort with .PST files and know their location; then, browse to the location of the *.PST files; close “Outlook”, and create a backup of all existing *.PST files.

Here is what I do:

  • keep one .PST file per year; in my case, every .PST file grows to around ~10 GB before getting retired;
  • name the default .PST file something year neutral; for example “current_year.pst”;
  • when the year ends, close “Outlook”, rename “current_year.pst” to an adequate name for the year that gets “retired”, such as “2021.pst”;
  • relaunch “Outlook”; the software will NOT find the “retired” default destination PST data file and will ask to create a new one, with a name and location of your choice. Keep the name configured as the default destination data file (e.g. “current_year.pst”) and keep all the *.PST data files in the same folder.
  • with “Outlook” running, go to File > account settings > account settings > “data files” tab > “add” button, and browse to the “2021.pst” file, to make it available.

That is it – the new year messages will be arriving to the same, previously configured data file (now empty), without breaking any rules set, or any email account’s settings. The past year messsages are all still available in the just added .PST data file. Easy organization. Offline email. Ready for another year.

Keeping DNS records on noip.com

With the growth of the Cloud and micro-services, I found myself needing “fast” domain names that can be instantaneously administered, with features such as trivial proof of ownership and immediate CNAME edits.

My choice for such names is noip.com, which I know from the days I hosted websites at home.

I just renewed my subscription. Because 2021 is ending, the discount code NEWYEAR35 worked – I got 35% off. This is a temporary code, of course.

On the other hand, here is a permanent 5+ USD discount link:
https://www.noip.com?fpr=always-max-discount-on

Trying to opt-out of Microsoft's Viva emails?

The most you can do to opt-out of “Microsoft Viva” emails is to go to

https://cortana.office.com/

and to switch off “Cortana Briefing”.

This is an invasive, unnecessary, never requested feature, that harvests your data – namely emails – for situations like appointments, compromises assumed with other people, etc.

Odds are that you will stop receiving the “Viva” emails, but your data will remain training the Cortana agent, which is unfortunate.

The extreme disrespect Microsoft has for its users

It is almost unreal that after all these years, on the verge of 2022, Microsoft, with its nearly unlimited material resources, keeps its Windows 10 Edge browser stealing file associations against the user’s explicit and repeatedly assumed options, regarding default apps.

I am so annoyed to have to change my PDF viewer option to my favorite software for the task, nearly every single day, that I had to write it out. For me, Microsoft Edge is “irritation software”, a non-tool, an app that serves no other true purpose than to upload its users’ data to the Microsoft cloud. There is nothing in the Edge software that users can not find in alternatives, some genuinely transparent, obeying, and open source, namely in Mozilla’s Firefox, including the support for vertical tabs which – for me – is a must, and is lacking in most Web browsers. It is just horrible.

Some users are to blame. For the convenience of not having to redo their preferences, they stick and comply to Edge’s clepto dictatorship.

This is one of several annoyances in Windows 10, probably the most frequent, but NOT the most time-consuming one. The problem that has burned me the most time is the “no Internet connection available” FALSE connectivity information.

This problem can arise in a multitude of situations, but the odds of it happening are higher in systems with more than one NIC (Network Interface Card) and/or when users run their own DNS server. In these scenarios, the ridiculous service “Network Location Awareness” (NLA) might fail to properly process what is set in Windows’ registry at

Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NlaSvc\Parameters\Internet

There, Microsoft sets some callback URLs, which are regularly contacted (keys “ActiveDnsProbeContent”, “ActiveDnsProbeContentV6”), resolved via specific DNS hosts (keys “ActiveDnsProbeHost”, “ActiveDnsProbeHostV6”), and expected to regularly answer certain data (set in keys “ActiveWebProbeContent” and “ActiveWebProbeContentV6”).
This is a miserable, absurdly bad design option, disconnected from other network configurations.

If a NIC is set to a “private” network, that uses its own custom DNS resolver – as in my case – the NLA service might consider that the entire system is without Internet connection, even if other NICs do have access. Or, at least, that is what is always happening with me. It is oh so frustrating.

To aggravate the ridiculousness, if one searches for solutions, the top results tend to be Microsoft’s own “answers” sites, which answer absolutely nothing. Those Microsoft forums consist of threads that read as nightmares. People respond with mechanically scripted texts that go around most problems’ essence.
Search engines are at blame here too, for ranking such garbage sites so high.

Having to systematically fix the same problems, again and again, is an offense to productivity. It simply should not happen.
For users to think this is a justifiable behavior from a company the size of Microsoft to let these situations go on, for 10+ years now, sets an incredible low, extremely negative standard, to everyone else. “Oh time and data are so precious, but let us push people to waste them, just because we can and it is nice to milk their corresponding data”.

Shame on you Microsoft. It should not be as it is. You are doing many things wrong.

Transistor last had an episode 4 years ago

Transistor was one of the first podcasts I found worth the time. It is about “scientific curiosities” and it ran until 2017-11. This November seems to signal the 4th year after its end.
Now that it is possible for Spotify users to compile playlists which contain episodes of “shows” (the technical name the API gives to this type of content), I wrote some code which helps me in better organizing my listening experience.
Here is a playlist of the entire 62 episodes of “Transistor”.
@ https://open.spotify.com/playlist/58aIPGMiDpLyavGNkB37mK

Notice: the playlist indeed has the entire series, but there is an imposed limit on the number of visible entries, when embedding, so you’ll see only 50 entries. To see the full content, add directly to the Spotify app.

Private personal pet project: downloader for "Jornal de Negócios"

bot “bb_jnegocios1_dl_edition.php”

Purpose
This is a tool to download a single date-specific edition of “Jornal de Negocios” – a very good newspaper on business and markets, focused on the Portuguese context – as available from
https://quiosque.cofina.pt/jornal-de-negocios/
or
https://quiosque.cofina.pt/jornal-de-negocios/yyyymmdd

For example:
https://quiosque.cofina.pt/jornal-de-negocios/20210729

This is content only available to subscribers. I am a longtime subscriber, but I very much prefer to have all the content offline and compiled together, for me to consume whenever I want, regardless of internet connection availability. Publishers usually do NOT provide this level of control over the contents, so I have to write my own tools. This post is a glimpse on one of the tools.

Related projects of my own

This depends on my AmConsole class, to handle the user’s command-line arguments, using a pattern of my-own for forcing a certain discipline for default values, validations and descriptions.

The most important code in the project, by far, is class “QuiosqueCofinaPT”, which does all the impersonation jobs: login, browse to edition, flip the pages, save snapshots, etc.
That class “QuiosqueCofinaPT” depends on another lower-level class of mine named “AmWebDriver”, which directly interfaces with a running instance of Selenium hub, which then controls a running web-browser. Firefox (ESR) is the version in use.

There is an external Scrivener book file “bot_jornaldenegocios.scrivx” which captures the evolution of the project that resulted in this new bot for Blogbot.

Example calls

php  bb_jnegocios1_dl_edition 2021 7 29 4444 #all possible arguments given
php  bb_jnegocios1_dl_edition 2021 7 29 #omits the Selenium driver port, defaults to 4444
php  bb_jnegocios1_dl_edition 2021 7 #omits the day and the port, defaults to current day and port 4444
php  bb_jnegocios1_dl_edition 2021 #omits the month, the day and the port; defaults to current month and day, port 444
php  bb_jnegocios1_dl_edition #omits everything, default to current date and port 4444

Source code (of the dl script only, not of the supporting classes)

<?php
require_once  "./vendor/autoload.php";

use am\util\AmDate;
use am\internet\HttpHelper;
use am\internet\QuiosqueCofinaPT;
use am\console\Console;

define ("THIS_HARVESTER_NAME", "BB 'quiosque.cofina.pt/jornal-de-negocios/' &#91;from&#93; daily edition harvester".PHP_EOL);
define ("THIS_HARVESTER_VERSION", "v20210728 2000".PHP_EOL);

echo THIS_HARVESTER_NAME;
echo THIS_HARVESTER_VERSION;

const MIN_NUMBER_OF_ARGUMENTS_THE_USER_MUST_PROVIDE = 0;
const ARGUMENT_YEAR_INDEX_IN_ARGV = 1;
const ARGUMENT_MONTH_INDEX_IN_ARGV = 2;
const ARGUMENT_DAY_INDEX_IN_ARGV = 3;
const ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV = 4;

const DEFAULT_VALUE_FOR_ARGUMENT_DRIVER_PORT = \am\internet\AmChromeDriver::SELENIUM_HUB_DEFAULT_SERVER_PORT; //default for selenium (do not confuse with chromedriver.exe 9515 port)

// to use AmConsole, one must provide a validation function per possible argument
// in this case, all args can be validated by the same function 'validateIsIntegerGTOE1'
$arrayOfValidationFunctions = &#91;
    ARGUMENT_YEAR_INDEX_IN_ARGV => "validateIsIntegerGTOE1",
    ARGUMENT_MONTH_INDEX_IN_ARGV => "validateIsIntegerGTOE1",
    ARGUMENT_DAY_INDEX_IN_ARGV => "validateIsIntegerGTOE1",
    ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV => "validateIsIntegerGTOE1"
];

// to use AmConsole, one must provide describe every possible argument
$arrayOfDescriptorsOneForEachCommandLineArg = [
    ARGUMENT_YEAR_INDEX_IN_ARGV => "Integer >=1 can be supplied, for year (defaults to system's year).",
    ARGUMENT_MONTH_INDEX_IN_ARGV => "Integer >=1 can be supplied, for month (defaults to system's month).",
    ARGUMENT_DAY_INDEX_IN_ARGV => "Integer >=1 can be supplied, for day (defaults to system's day).",
    ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV => "Integer >=1 expected, for driver port (defaults to 4444).",
];

// to use AmConsole, one must provide describe default values for every possible argument that the user can omit
$strCurrentDate = date("Y-m-d");
$aCurrentDate = explode("-", $strCurrentDate);
$iYear = intval($aCurrentDate[0]);
$iMonth = intval($aCurrentDate[1]);
$iDay = intval($aCurrentDate[2]);
$arrayOfDefaultValues = [
    0 => __FILE__ //always like this, to state this very same script as one argument
    ,
    ARGUMENT_YEAR_INDEX_IN_ARGV => $iYear
    ,
    ARGUMENT_MONTH_INDEX_IN_ARGV => $iMonth
    ,
    ARGUMENT_DAY_INDEX_IN_ARGV => $iDay
    ,
    ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV => DEFAULT_VALUE_FOR_ARGUMENT_DRIVER_PORT
];

//-------------------- VALIDATORS START --------------------

function validateIsIntegerGTOE1 (
    $pInt
) : bool
{
    $iResult = \am\util\Util::toInteger($pInt);
    return $iResult ? $iResult>=1 : false;
}//validateIsIntegerGTOE1

//-------------------- VALIDATORS END --------------------

//----------- ACTION (PROBLEM SPECIFIC) STARTS------------
function action(
    $pConsole
){
    $y = intval($pConsole->mArgv[ARGUMENT_YEAR_INDEX_IN_ARGV]); //if the values that populated the mArgv object are user supplied they'll be strings
    $m = intval($pConsole->mArgv[ARGUMENT_MONTH_INDEX_IN_ARGV]);
    $d = intval($pConsole->mArgv[ARGUMENT_DAY_INDEX_IN_ARGV]);
    $driverPort = intval($pConsole->mArgv[ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV]);

    $bValidDate = \am\util\DateTools::validDay($y, $m, $d);
    if ($bValidDate){
        echo "Valid date received. Will now download the JN publications.".PHP_EOL;
        /*
         * these secrets can be captured on the PHP LOG FILE!
         * TODO: how to avoid this security risk?
         * https://websec.io/2018/06/14/Keep-Credentials-Secure.html
         */
        $o = new QuiosqueCofinaPT(
            SECRET_QUIOSQUE_COFINA_LOGIN_NAME_1,
            SECRET_QUIOSQUE_COFINA_PASSWORD_1,

            $driverPort,
            HttpHelper::USER_AGENT_STRING_CHROME_70
        );
        $loginRet = $o->actionLogin();
        $startDate = new AmDate($y, $m, $d);
        $bIsSunday = $startDate->isSunday();

        if (!$bIsSunday){
            $o->browseDailyEditionAndSnapshotSaveAllPairsOfPages(
                $startDate->mY,
                $startDate->mM,
                $startDate->mD,
                "dls"
            );
        }//if NOT sunday
    }//if valid date
    else{
        echo "Call aborted - please supply a valid date!".PHP_EOL;
    }//else
}//action

//----------- ACTION (PROBLEM SPECIFIC) ENDS------------

/*
 * the __construct constructor of AmConsole throws an Exception when no command line arguments (including no script name) are received
 * PHPSTORM will signal a warning of "unhandled Exception" for the a call without try/catch
 */
try {
    $oConsole = new \am\console\AmConsole(
        $argv,
        $pMinNumberOfArguments = MIN_NUMBER_OF_ARGUMENTS_THE_USER_MUST_PROVIDE,
        $arrayOfDefaultValues,
        $arrayOfValidationFunctions,
        $arrayOfDescriptorsOneForEachCommandLineArg
    );
}//try
catch (Exception $e){
    echo $e->getMessage();
}//catch

echo $oConsole; //a summary of everything received
$c0 = $oConsole->allArgsOK();
if ($c0) action (
    $oConsole
);
else{
    echo "Did NOT call the script, because 1+ argument(s) was not OK.".PHP_EOL;
}

Results
In the end, this bot produces files in an automatically created folder, containing snapshots of the pages. Other tools will OCR and compile the contents together.

How to download courses from Coursera, in 2021

To download COURSERA.ORG courses one subscribes to, either one writes its own bot, which will have to solve the authentication challenge and be able to crawl, identify and fetch all the relevant course files, or one learns to use the “COURSERA-DL” free and open source project (FOSS), mostly written in the language Python, available from:
https://github.com/coursera-dl/coursera-dl/

The first option is great for learning the correspondent skills, but it is hard work.

The second option is immediately available and is much more sensible for instantaneous results, mainly for those who are only focused in getting the course materials, for offline studying.

This post is about installing and using COURSERA-DL. The post assumes “Python” is properly installed. The commands shown were tested on a Python installation on Windows 10.

To install or update COURSERA-DL, the following sequence of commands will work. Enter the commands from any command-line console (CMD.EXE on Windows). Even if COURSERA-DL is already installed, it will remain so, keeping its configuration, and it will only be updated. The commands go a bit beyond COURSERA-DL, because I also care about EDX courses.
One project similar to COURSERA-DL is EDX-DL, for courses at EDX.ORG. Both learning sites have materials on YOUTUBE.COM, so yet another related FOSS is YOUTUBE-DL.

python -m pip install --upgrade pip
pip install --upgrade coursera-dl
pip install --upgrade edx-dl
pip install --upgrade youtube-dl

Once these FOSS solutions are made available on the system, they can be called from the command-line.

To know the technical name of a COURSERA.ORG course, pay attention to its URL, when learning in a browser session. For example, when starting to learn the Coursera course named “Build a Modern Computer From First Principles”, the URL is
https://www.coursera.org/learn/build-a-computer/home/welcome

The technical name is “build-a-computer“, i.e., the string after “https://www.coursera.org/learn/” and before the subsequent forward-slash (“/”). This parsing rule should work for any course.

To download a COURSERA.ORG course named “XPTO”, logging-in as “user@email.com”, having password “1234”, in theory, it should suffice to launch a command-line window (CMD.EXE on any Windows) and enter:

coursera-dl -u "user@email.com" -p "1234" "XPTO"

These days, this will probably FAIL, due to the introduction of CAPTCHAS which defeat many bots.

As of February 2021, COURSERA-DL does NOT defeat the COURSERA CAPTCHA, about picking images which solve some challenge. Defeating CAPTCHAs can be quite a project on its own, so it is understandable that this is happening. The workaround is easy, but not automatable.

For each COURSERA.ORG course you are subscribed to, when you use a web browser to learn it, a cookie named “CAUTH” for domain “.coursera.org” is created on the local computer. In my case, I always use Firefox and the extension “cookie quick manager”, to see the cookies for domains. Using that extension, or equivalent, just observe, text-select, and copy the string value for the CAUTH cookie, which can be a long string (hundreds of chars).

Then, provide the value of that string upon calling COURSERA-DL:

coursera-dl -u "user@email.com" -p "1234" "XPTO" -ca "hundreds of chars go here"

That is it.
For a better workflow, find the folder where the Python script for coursera-dl is; i.e. search for the local file “coursera-dl.py“.

If you have Python installed at

c:\python

the file will be at

c:\python\scripts

In the scripts folder, create a NEW text file named “coursera.conf“, consisting of the sensitive data and other eventual arguments you can learn about by reading COURSERA-DL’s documentation.

For example:

-u "user@email.com" -p "1234" --subtitle-language en --download-quizzes

The text above is the content inside the text file “coursera.conf“, saved in the same folder that contains the coursera-dl.py script.

Now, to download course “XPTO”, just do:

coursera-dl "XPTO" -ca "hundreds of chars go here"