The extreme disrespect Microsoft has for its users

It is almost unreal that after all these years, on the verge of 2022, Microsoft, with its nearly unlimited material resources, keeps its Windows 10 Edge browser stealing file associations against the user’s explicit and repeatedly assumed options, regarding default apps.

I am so annoyed to have to change my PDF viewer option to my favorite software for the task, nearly every single day, that I had to write it out. For me, Microsoft Edge is “irritation software”, a non-tool, an app that serves no other true purpose than to upload its users’ data to the Microsoft cloud. There is nothing in the Edge software that users can not find in alternatives, some genuinely transparent, obeying, and open source, namely in Mozilla’s Firefox, including the support for vertical tabs which – for me – is a must, and is lacking in most Web browsers. It is just horrible.

Some users are to blame. For the convenience of not having to redo their preferences, they stick and comply to Edge’s clepto dictatorship.

This is one of several annoyances in Windows 10, probably the most frequent, but NOT the most time-consuming one. The problem that has burned me the most time is the “no Internet connection available” FALSE connectivity information.

This problem can arise in a multitude of situations, but the odds of it happening are higher in systems with more than one NIC (Network Interface Card) and/or when users run their own DNS server. In these scenarios, the ridiculous service “Network Location Awareness” (NLA) might fail to properly process what is set in Windows’ registry at

Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NlaSvc\Parameters\Internet

There, Microsoft sets some callback URLs, which are regularly contacted (keys “ActiveDnsProbeContent”, “ActiveDnsProbeContentV6”), resolved via specific DNS hosts (keys “ActiveDnsProbeHost”, “ActiveDnsProbeHostV6”), and expected to regularly answer certain data (set in keys “ActiveWebProbeContent” and “ActiveWebProbeContentV6”).
This is a miserable, absurdly bad design option, disconnected from other network configurations.

If a NIC is set to a “private” network, that uses its own custom DNS resolver – as in my case – the NLA service might consider that the entire system is without Internet connection, even if other NICs do have access. Or, at least, that is what is always happening with me. It is oh so frustrating.

To aggravate the ridiculousness, if one searches for solutions, the top results tend to be Microsoft’s own “answers” sites, which answer absolutely nothing. Those Microsoft forums consist of threads that read as nightmares. People respond with mechanically scripted texts that go around most problems’ essence.
Search engines are at blame here too, for ranking such garbage sites so high.

Having to systematically fix the same problems, again and again, is an offense to productivity. It simply should not happen.
For users to think this is a justifiable behavior from a company the size of Microsoft to let these situations go on, for 10+ years now, sets an incredible low, extremely negative standard, to everyone else. “Oh time and data are so precious, but let us push people to waste them, just because we can and it is nice to milk their corresponding data”.

Shame on you Microsoft. It should not be as it is. You are doing many things wrong.

Transistor last had an episode 4 years ago

Transistor was one of the first podcasts I found worth the time. It is about “scientific curiosities” and it ran until 2017-11. This November seems to signal the 4th year after its end.
Now that it is possible for Spotify users to compile playlists which contain episodes of “shows” (the technical name the API gives to this type of content), I wrote some code which helps me in better organizing my listening experience.
Here is a playlist of the entire 62 episodes of “Transistor”.
@ https://open.spotify.com/playlist/58aIPGMiDpLyavGNkB37mK

Notice: the playlist indeed has the entire series, but there is an imposed limit on the number of visible entries, when embedding, so you’ll see only 50 entries. To see the full content, add directly to the Spotify app.

Private personal pet project: downloader for "Jornal de Negócios"

bot “bb_jnegocios1_dl_edition.php”

Purpose
This is a tool to download a single date-specific edition of “Jornal de Negocios” – a very good newspaper on business and markets, focused on the Portuguese context – as available from
https://quiosque.cofina.pt/jornal-de-negocios/
or
https://quiosque.cofina.pt/jornal-de-negocios/yyyymmdd

For example:
https://quiosque.cofina.pt/jornal-de-negocios/20210729

This is content only available to subscribers. I am a longtime subscriber, but I very much prefer to have all the content offline and compiled together, for me to consume whenever I want, regardless of internet connection availability. Publishers usually do NOT provide this level of control over the contents, so I have to write my own tools. This post is a glimpse on one of the tools.

Related projects of my own

This depends on my AmConsole class, to handle the user’s command-line arguments, using a pattern of my-own for forcing a certain discipline for default values, validations and descriptions.

The most important code in the project, by far, is class “QuiosqueCofinaPT”, which does all the impersonation jobs: login, browse to edition, flip the pages, save snapshots, etc.
That class “QuiosqueCofinaPT” depends on another lower-level class of mine named “AmWebDriver”, which directly interfaces with a running instance of Selenium hub, which then controls a running web-browser. Firefox (ESR) is the version in use.

There is an external Scrivener book file “bot_jornaldenegocios.scrivx” which captures the evolution of the project that resulted in this new bot for Blogbot.

Example calls

php  bb_jnegocios1_dl_edition 2021 7 29 4444 #all possible arguments given
php  bb_jnegocios1_dl_edition 2021 7 29 #omits the Selenium driver port, defaults to 4444
php  bb_jnegocios1_dl_edition 2021 7 #omits the day and the port, defaults to current day and port 4444
php  bb_jnegocios1_dl_edition 2021 #omits the month, the day and the port; defaults to current month and day, port 444
php  bb_jnegocios1_dl_edition #omits everything, default to current date and port 4444

Source code (of the dl script only, not of the supporting classes)

<?php
require_once  "./vendor/autoload.php";

use am\util\AmDate;
use am\internet\HttpHelper;
use am\internet\QuiosqueCofinaPT;
use am\console\Console;

define ("THIS_HARVESTER_NAME", "BB 'quiosque.cofina.pt/jornal-de-negocios/' &#91;from&#93; daily edition harvester".PHP_EOL);
define ("THIS_HARVESTER_VERSION", "v20210728 2000".PHP_EOL);

echo THIS_HARVESTER_NAME;
echo THIS_HARVESTER_VERSION;

const MIN_NUMBER_OF_ARGUMENTS_THE_USER_MUST_PROVIDE = 0;
const ARGUMENT_YEAR_INDEX_IN_ARGV = 1;
const ARGUMENT_MONTH_INDEX_IN_ARGV = 2;
const ARGUMENT_DAY_INDEX_IN_ARGV = 3;
const ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV = 4;

const DEFAULT_VALUE_FOR_ARGUMENT_DRIVER_PORT = \am\internet\AmChromeDriver::SELENIUM_HUB_DEFAULT_SERVER_PORT; //default for selenium (do not confuse with chromedriver.exe 9515 port)

// to use AmConsole, one must provide a validation function per possible argument
// in this case, all args can be validated by the same function 'validateIsIntegerGTOE1'
$arrayOfValidationFunctions = &#91;
    ARGUMENT_YEAR_INDEX_IN_ARGV => "validateIsIntegerGTOE1",
    ARGUMENT_MONTH_INDEX_IN_ARGV => "validateIsIntegerGTOE1",
    ARGUMENT_DAY_INDEX_IN_ARGV => "validateIsIntegerGTOE1",
    ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV => "validateIsIntegerGTOE1"
];

// to use AmConsole, one must provide describe every possible argument
$arrayOfDescriptorsOneForEachCommandLineArg = [
    ARGUMENT_YEAR_INDEX_IN_ARGV => "Integer >=1 can be supplied, for year (defaults to system's year).",
    ARGUMENT_MONTH_INDEX_IN_ARGV => "Integer >=1 can be supplied, for month (defaults to system's month).",
    ARGUMENT_DAY_INDEX_IN_ARGV => "Integer >=1 can be supplied, for day (defaults to system's day).",
    ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV => "Integer >=1 expected, for driver port (defaults to 4444).",
];

// to use AmConsole, one must provide describe default values for every possible argument that the user can omit
$strCurrentDate = date("Y-m-d");
$aCurrentDate = explode("-", $strCurrentDate);
$iYear = intval($aCurrentDate[0]);
$iMonth = intval($aCurrentDate[1]);
$iDay = intval($aCurrentDate[2]);
$arrayOfDefaultValues = [
    0 => __FILE__ //always like this, to state this very same script as one argument
    ,
    ARGUMENT_YEAR_INDEX_IN_ARGV => $iYear
    ,
    ARGUMENT_MONTH_INDEX_IN_ARGV => $iMonth
    ,
    ARGUMENT_DAY_INDEX_IN_ARGV => $iDay
    ,
    ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV => DEFAULT_VALUE_FOR_ARGUMENT_DRIVER_PORT
];

//-------------------- VALIDATORS START --------------------

function validateIsIntegerGTOE1 (
    $pInt
) : bool
{
    $iResult = \am\util\Util::toInteger($pInt);
    return $iResult ? $iResult>=1 : false;
}//validateIsIntegerGTOE1

//-------------------- VALIDATORS END --------------------

//----------- ACTION (PROBLEM SPECIFIC) STARTS------------
function action(
    $pConsole
){
    $y = intval($pConsole->mArgv[ARGUMENT_YEAR_INDEX_IN_ARGV]); //if the values that populated the mArgv object are user supplied they'll be strings
    $m = intval($pConsole->mArgv[ARGUMENT_MONTH_INDEX_IN_ARGV]);
    $d = intval($pConsole->mArgv[ARGUMENT_DAY_INDEX_IN_ARGV]);
    $driverPort = intval($pConsole->mArgv[ARGUMENT_DRIVER_PORT_INDEX_IN_ARGV]);

    $bValidDate = \am\util\DateTools::validDay($y, $m, $d);
    if ($bValidDate){
        echo "Valid date received. Will now download the JN publications.".PHP_EOL;
        /*
         * these secrets can be captured on the PHP LOG FILE!
         * TODO: how to avoid this security risk?
         * https://websec.io/2018/06/14/Keep-Credentials-Secure.html
         */
        $o = new QuiosqueCofinaPT(
            SECRET_QUIOSQUE_COFINA_LOGIN_NAME_1,
            SECRET_QUIOSQUE_COFINA_PASSWORD_1,

            $driverPort,
            HttpHelper::USER_AGENT_STRING_CHROME_70
        );
        $loginRet = $o->actionLogin();
        $startDate = new AmDate($y, $m, $d);
        $bIsSunday = $startDate->isSunday();

        if (!$bIsSunday){
            $o->browseDailyEditionAndSnapshotSaveAllPairsOfPages(
                $startDate->mY,
                $startDate->mM,
                $startDate->mD,
                "dls"
            );
        }//if NOT sunday
    }//if valid date
    else{
        echo "Call aborted - please supply a valid date!".PHP_EOL;
    }//else
}//action

//----------- ACTION (PROBLEM SPECIFIC) ENDS------------

/*
 * the __construct constructor of AmConsole throws an Exception when no command line arguments (including no script name) are received
 * PHPSTORM will signal a warning of "unhandled Exception" for the a call without try/catch
 */
try {
    $oConsole = new \am\console\AmConsole(
        $argv,
        $pMinNumberOfArguments = MIN_NUMBER_OF_ARGUMENTS_THE_USER_MUST_PROVIDE,
        $arrayOfDefaultValues,
        $arrayOfValidationFunctions,
        $arrayOfDescriptorsOneForEachCommandLineArg
    );
}//try
catch (Exception $e){
    echo $e->getMessage();
}//catch

echo $oConsole; //a summary of everything received
$c0 = $oConsole->allArgsOK();
if ($c0) action (
    $oConsole
);
else{
    echo "Did NOT call the script, because 1+ argument(s) was not OK.".PHP_EOL;
}

Results
In the end, this bot produces files in an automatically created folder, containing snapshots of the pages. Other tools will OCR and compile the contents together.

How to download courses from Coursera, in 2021

To download COURSERA.ORG courses one subscribes to, either one writes its own bot, which will have to solve the authentication challenge and be able to crawl, identify and fetch all the relevant course files, or one learns to use the “COURSERA-DL” free and open source project (FOSS), mostly written in the language Python, available from:
https://github.com/coursera-dl/coursera-dl/

The first option is great for learning the correspondent skills, but it is hard work.

The second option is immediately available and is much more sensible for instantaneous results, mainly for those who are only focused in getting the course materials, for offline studying.

This post is about installing and using COURSERA-DL. The post assumes “Python” is properly installed. The commands shown were tested on a Python installation on Windows 10.

To install or update COURSERA-DL, the following sequence of commands will work. Enter the commands from any command-line console (CMD.EXE on Windows). Even if COURSERA-DL is already installed, it will remain so, keeping its configuration, and it will only be updated. The commands go a bit beyond COURSERA-DL, because I also care about EDX courses.
One project similar to COURSERA-DL is EDX-DL, for courses at EDX.ORG. Both learning sites have materials on YOUTUBE.COM, so yet another related FOSS is YOUTUBE-DL.

python -m pip install --upgrade pip
pip install --upgrade coursera-dl
pip install --upgrade edx-dl
pip install --upgrade youtube-dl

Once these FOSS solutions are made available on the system, they can be called from the command-line.

To know the technical name of a COURSERA.ORG course, pay attention to its URL, when learning in a browser session. For example, when starting to learn the Coursera course named “Build a Modern Computer From First Principles”, the URL is
https://www.coursera.org/learn/build-a-computer/home/welcome

The technical name is “build-a-computer“, i.e., the string after “https://www.coursera.org/learn/” and before the subsequent forward-slash (“/”). This parsing rule should work for any course.

To download a COURSERA.ORG course named “XPTO”, logging-in as “user@email.com”, having password “1234”, in theory, it should suffice to launch a command-line window (CMD.EXE on any Windows) and enter:

coursera-dl -u "user@email.com" -p "1234" "XPTO"

These days, this will probably FAIL, due to the introduction of CAPTCHAS which defeat many bots.

As of February 2021, COURSERA-DL does NOT defeat the COURSERA CAPTCHA, about picking images which solve some challenge. Defeating CAPTCHAs can be quite a project on its own, so it is understandable that this is happening. The workaround is easy, but not automatable.

For each COURSERA.ORG course you are subscribed to, when you use a web browser to learn it, a cookie named “CAUTH” for domain “.coursera.org” is created on the local computer. In my case, I always use Firefox and the extension “cookie quick manager”, to see the cookies for domains. Using that extension, or equivalent, just observe, text-select, and copy the string value for the CAUTH cookie, which can be a long string (hundreds of chars).

Then, provide the value of that string upon calling COURSERA-DL:

coursera-dl -u "user@email.com" -p "1234" "XPTO" -ca "hundreds of chars go here"

That is it.
For a better workflow, find the folder where the Python script for coursera-dl is; i.e. search for the local file “coursera-dl.py“.

If you have Python installed at

c:\python

the file will be at

c:\python\scripts

In the scripts folder, create a NEW text file named “coursera.conf“, consisting of the sensitive data and other eventual arguments you can learn about by reading COURSERA-DL’s documentation.

For example:

-u "user@email.com" -p "1234" --subtitle-language en --download-quizzes

The text above is the content inside the text file “coursera.conf“, saved in the same folder that contains the coursera-dl.py script.

Now, to download course “XPTO”, just do:

coursera-dl "XPTO" -ca "hundreds of chars go here"

The outdoor sky/clouds have joined my plants stream

I decided to add a 5th camera to the live stream of my plants (not) growing. This new camera captures the outdoor sky/clouds, and serves as a natural reference to what time of day is it, since I do not overlay any date or time indication in the sources. As I write, it is dark outside – not the best timing :).

For now, the stream is available on Twitch: https://www.twitch.tv/arturmarques_dot_com.

In the past, instead of a live stream, my option was to build time-lapse videos. To assist in the process, I coded solutions that build automatic time-lapse videos from images datasets, with configurable quality. When using these tools, I usually build 24 hours videos, but I could request the output of a larger or shorter time span – for example, I have enough material to construct months-long files. The key reason why I have not been doing so, is that I have moved much of the raw data to the cloud, which is not as instantaneously readable, as local physical volumes. When I started playing with these media and doing these easy, fun, observations, one key reward was being able to promptly unveil whatever had happened in the past x hours.

I will adapt my solutions to the new cloud storage and automate the process again. Until then, the live stream should be available with some regularity.

URLs "p1" – 89 resources

I am an avid WWW surfer, with hundreds of websites visited each month, sometimes daily. I bookmark them all, at least for logging purposes. These posts having the "urls" category, capture what was on my browser on a specific date. I hope you enjoy some of these shared resources.


Listening to Kelpe – "Ex-Aquarium"

Forget about conventional “power music”. This is it! Contained, yet systematically “growing”, not in beats-per-unit-of-time, nor in a linear fashion, but, overall, in stage size and/or “crafting” of a particular audio ambiance; effectively embracing, even invading of the listener’s attention, sometimes releasing after peaks.
Highly captivating music, combining instruments, as simple as single chords and basic drum plates, with laboriously thought, felt, loved!, musical environments.

Congratulations to “Kelpe”, Kel McKeown! This particular “Ex-Aquarium” (2008) album I am listening to, is a wonderful and intelligent creation. I am glad I found it – pay special attention to track #2, “Whirlwound”.

youtube-dl – an absurd, sad situation

What follows is my briefest introduction to “copyright”, as I limitedly understand it, followed by my personal thoughts on yesterday’s RIIA initiated DMCA takedown of the project “youtube-dl” from github.com.

The full request for the takedown of “youtube-dl”, and many of its forks, is at
https://github.com/github/dmca/blob/master/2020/10/2020-10-23-RIAA.md

Intro
“Copyright”, from an economic perspective, is a set of monopolies given to creators, to incentivize “creation”. The rational for these incentives is that creation is hard, failure-prone, and copying is relatively trivial.
However, if these monopolies were excessive, for example lasting “forever”, then creators, their heirs, or to whom the rights/monopolies were sold, would constitute a permanent bottleneck between the creation and the opportunity for society, as a whole, to benefit from it, with unrestricted freedom. Thus, the monopolies are time-limited – they have an expiration date.

There are also exceptions in the law. In the USA, “fair use” is the chapter to read, to understand exceptions. For example, showing a copyrighted video to a class of students, in an educational context, will likely be valid.
In Europe, exceptions are similar – international treaties signed by most countries have “harmonized” national copyright systems -, but include explicitly enunciated use-cases, that bypass the chance of litigation and will not require a judge to interpret particular situations as “fair use” or not, namely some learning acts at public libraries.

The creator alone has the exclusive rights to decide who/what can be done with the creation; if/how it can be modified, and if/how it can be distributed. Societies, in the so-called “Knowledge Economy” we are living in, will mostly progress fueled by better knowledge, so creators are the professionals that modern societies need and “Copyright” law must keep evolving to keep the proper balance between the creators’ rewards and the societal benefits.

The “DMCA” (Digital Millennium Copyright Act) is one of the many changes that Copyright law incorporated, in the USA. However, it is an ugly one, because until year 2000 clean reverse engineering practices would probably be legit, and since then, if for bypassing certain TPMs (Technology Protection Measures) that can “compromise” the creator, they might not be.

My thoughts
Clean reverse engineering practices usually are extraordinary innovations and should not be barred. The perverse effect of making certain TPM-defeating processes illegal, even when identified cleanly, with absolutely no access to the source intellectual property, is that the knowledge of the available bypasses will rest in the hands of the very few who do manage to find them. The chance for improvement is lost and asymmetries intensify, with solutions only available to few, definitely not available to the entire community, leaving most under the false believe that the current fruition model is the single possible one. This has fueled “bug bounty” programs, thus contributing to alternative reward systems.

These are very hard topics to discuss lightly, and this post sure is light. But, right now, I find it very negative, wrong on many levels – economical and intellectual -, damaging for all in the long-run, and intensely disrespectful for the thinkers, writers and coders involved, that RIIA is attacking years of hard labored source code developed by a community of intellectuals.

The “youtube-dl” source code has probably done nothing more than to promote the exact same artists that, allegedly, are being hurt by it. This is truly unfair. Have common sense! Some of the referenced artists themselves should take a good look at the mirror and try to assess if these tools are taking food out of their tables – what they are indirectly doing, is taking the creation pleasure out of the lives of innocents, who just enjoy creating software. Have some decency. Live and love, and let live and love.

I also tweeted about this:
https://twitter.com/my_dot_com/