Posted:August 12, 2020

CWPK #13: Managing Python Packages and Environments

Keeping Multiple Interacting Parts Current

Early in this series (CWPK #9) of Cooking with Python and KBpedia, I noted the importance of Anaconda as a package and configuration manager for Python. Akin to the design of the Unix and Linux operating systems, Python applications are an ecosystem of scripts and libraries that are shared and invoked across multiple applications and uses. The largest Python repository, PyPi, itself contains more than 230,000 projects. The basic installation of Anaconda contains about 500 packages on its own in its standard configuration.

Since the overwhelming majority of these projects exists independently of the other ones and each progresses on its own schedule of improvements and new version releases, it is not hyperbole to envision the relative stability of a package installer such as Anaconda as masking a bubbling cauldron of constant package changes under the surface. To illustrate this process I will focus on one of the Anaconda packages called Pandoc that figures prominently in the next installment of this CWPK series. Pandoc has been around for about 15 years and is the undisputed king of applications for converting one major text format type into another. Pandoc processes external formats using what it calls ‘readers’, converts that external form into an internal representation, and then uses ‘writers’ to output that internal representation into another form useful to external applications. Generally, a given format has both a reader and a writer, though there are a few strays. In the current version of Pandoc (2.9.2.x) there are 33 readers and 55 writers.

A Python environment is a dedicated directory where specific dependencies can be stored and maintained. Environments have unique names and can be activated when you need them, allowing you to have ultimate control over the libraries that are installed at any given time. You can create as many environments as you want. Because each one is independent, they will not interact or ‘mess up’ the other. Thus, it is common for programmers to create new environments for each project that they work on. Often times, information about your environment can assist you in debugging certain errors. Starting with a clean environment for each project can help you control the number of variables to consider when looking for bugs. When it comes to creating environments, you have two choices:

you can create a virtual environment (venv) using pip to install packages or
create a conda environment with conda installing packages for you. [1]

In my own work I tend to author documents either in HTML or LibreOffice, corresponding to the *.html and *.odt formats, respectively. However, the Jupyter Notebook that we will be using for our interactive electronic notebooks represents standard formatted text in the Markdown format (*.md) that it combines with the interactive portions that use embedded JavaScript Object Notation (JSON). The combination of these narrative and interactive portions is represented by the *.ipynb format. Markdown is a plain text superset of HTML that uses character conventions rather than bracketed tags (for example, ‘-‘ for marking bullets or ‘#‘ for marking headings). We’ll have many occasions to look at Markdown markup throughout this series. Since I was anticipating switching between writing narratives and interacting with code, I wanted to use my standard writing tools for longer explanations as well as to publish interactive notebook pages on static Web sites. I was investigating Pandoc as a means of ‘round-tripping‘ between HTML and *.ipynb and to leverage the strengths of each.

A quick look at the Pandoc site showed that, indeed, both formats were supported. Further, the Pandoc documentation also suggested there were ‘switches’ for the readers and writers of these two formats that would likely give me the control I needed to round-trip between formats with few or no errors. So, I downloaded the latest version of Pandoc (updating an earlier version already on my machine), and proceeded to do my set-up work in preparation for the upcoming CWPK installment #14. However, every time I ran the Pandoc command to do the conversion, I repeatedly got the error message of “Unknown output format.”

As I tried to debug this problem I made some discoveries. First, Pandoc was already a package included in Anaconda. Further, while I previously had Pandoc in my environment path, the new path entered when I installed Anaconda was put higher on the list, meaning the Anaconda Pandoc was invoked before the instance I had installed directly. As I investigated the Anaconda packages, I found that it was using Pandoc version 2.2.3.2, which dated from August 7, 2018. In investigating Pandoc releases, I noted that *.ipynb support was not introduced into Pandoc until version 2.6 on January 30, 2019. So, despite what the Web site stated and my own installation of the version from March 23, 2020, the actual Pandoc that was being used in my environment did not support the notebook format!

To sate my curiosity I took a random sample of a dozen packages hosted by Anaconda and compared them to later updates that might be found elsewhere on the Web or directly from the developers. I found Anaconda was up to date in about 10 of these 12 instances. However, in the instance of Pandoc this gap was material. This raises two important points. First, when first installing or when returning to use after a hiatus, it is important to update your existing distribution. For Anaconda, first begin to update that repository:

conda update --all

Invoking this option causes a flurry of activity as multiple packages are checked for currency, dependencies, and then proper load orders. These are the kinds of activities that formerly were painful and subject to many inadvertent conflicts as one package updated a dependency that broke another. This kind of update activity is shown by Figure 1.

Figure 1: Updating the Anaconda Environment

Second, we then need additional ways to find and install Python packages. The most common package installer in Python is pip, a leading method to accessing PyPi, though clearly Anaconda chose to use an alternate approach in conda. The philosophy of conda is to better manage dependencies and interactions between packages than pip historically provided. There are other repositories that have embraced that same philosophy, and one with even greater dependency testing than conda is conda-forge, also a popular repository for data science packages. In all random cases I checked, conda-forge had as recent or more recent packages than conda. conda-forge also had the most recent version of Pandoc. Further, conda-forge can be integrated into the Anaconda package installation environment.

Installing Pandoc from the conda-forge channel can be achieved by adding to your channels (in this case, Anaconda) [2]:

conda config --add channels conda-forge

Once the conda-forge channel has been enabled, Pandoc can be installed with:

conda install pandoc

It is possible, obviously, to add specific packages from conda-forge to your channel using this exact command format. It is also possible to list all of the versions of Pandoc available on your platform with:

conda search pandoc --channel conda-forge

This same approach may be used for any specific package maintained by conda-forge, while keeping dependencies and Anaconda current.

Here are some resources if you wish to explore Python package management further:

Use Conda Environments to Manage Python Dependencies
Keeping Anaconda Up To Date
Why You Need Python Environments and How to Manage Them with Conda
An Effective Python Environment: Making Yourself at Home
My Anaconda Workflow: Python Environment and Package Management Made Easy – a helpful summary of commands available when working with Anaconda.

We saw from Jupyter Notebook in CWPK #10 that it is not able to access all areas of your computer unless you place it at the root. That is never a good idea for security reasons. It is always best to keep your Python working environment sequestered to some extent. Further, as the sources above indicate, if you are to get serious with Python and engage in multiple projects, it is a good idea to use virtual environments as well as dedicated directories. I do not address the topic of virtual environments further in this series since many just learning for the first time may not need this complexity.

Another truth of such large installations as Anaconda is that it is very tricky — indeed, nearly impossible on a Windows machine — to change the directory in which it was first installed. The safest way to do so is to uninstall Anaconda then re-install it in the new directory. That can be disruptive itself, so is not a step to undertake lightly. It is therefore deserving of some attention to how you organize your directory structures. You are thus best to play a bit with your Python environment, see what is working and what is not in terms of your workflows and file locations, and then make changes if need be before committing to any true work-dependent tasks. I first introduced the question of directory structure in CWPK #9. We will continue this topic in earnest in our next installment.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Endnotes:

[1] Norris, Will, Jenny Palomino, and Leah Wasser. 2019. “Use Conda Environments to Manage Python Dependencies: Everything That You Need to Know.” Earth Data Science – Earth Lab. https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/use-python-packages/introduction-to-python-conda-environments/ (April 10, 2020).

[2] Conda-Forge/Pandoc-Feedstock. 2020. conda-forge. Shell. https://github.com/conda-forge/pandoc-feedstock (April 10, 2020)

Posted:August 11, 2020

CWPK #12: A Brief Pause to Learn Some Python

Let’s Get More Familiar with Python Syntax and Jupyter

Now that we have our basic environment set up and working, it is time to become a bit familiar with Python code and programs. To do so, we will use some online and reference documentation, relying in particular on an interaction environment with our Jupyter Notebook. In fact, from this point forward, all of our installments in the Cooking with Python and KBpedia series will also be accompanied by a working *.ipynb Notebook file that you are free to download and interact with at your leisure. Availability of the interactive notebooks will begin on Monday with CWPK #16.

Here are three interactive guides on how to program in Python using the interactive Jupyter Notebook environment. You may find use from all three (and there are many more on the Web, see especially GitHub and search on ‘jupyter python tutorial’, no quotes). I list these in suggested order for download and investigation if your time is limited:

An Introduction To Scripting in Python 3 – a good starting point with short, crisp lessons [1]
Python 3 Tutorial Using Jupyter Notebook – an update to Python3 from a well-regarded earlier Python2 guide [2]
An Introduction to Python and Programming – a 28-part notebook series. [3]

To download these from GitHub, go to the respective linked sites and pick the Clone or download button (1), followed by Download ZIP (2):

Figure 1: Download Notebook Files

And save the *.zip files to a location of your choosing (see further below). (Recall we are punting on the question of GitHub ‘pull’ requests in this series.) Unzip the downloads to their respective directories. We are now ready to start the Jupyter Notebook.

Recall that we may launch the Jupyter Notebook from either the Anaconda Prompt (CWPK #11) or from the Anaconda Navigator (CWPK #10). If we are launching from the prompt, the command is (base) C:\Users\user\jupyter notebook. Upon launch we see the standard Jupyter Notebook entry screen using the default C:\Users\user\ location used by the program:

Figure 2: Default Jupyter Notebook Entry Screen

If you play around with that directory structure, you will notice that this default directory is the root, and you are unable to navigate above it in your local file system. That poses a problem for me personally. I have for years not accepted Microsoft’s attempt to steer all of my created documents into a directory structure of its choosing. I prefer being able to control where I store files directly. Since we will be downloading and using interactive notebook files throughout this CWPK series, right off the bat I do not like the idea of having to store these files in the Jupyter default directory.

So, I have three choices. One, I could download and place my files into a directory under this default directory. However, that does not solve my initial problem. Two, I could download the files somewhere directly on my local machine, and then use the Upload option (1) in Figure 1 to move them into this default directory. To do so, I invoke the Upload button, which pulls up the Window Explorer for file uploads to which I navigate to the directory where I had just unzipped the files. I then highlight the unzipped files of interest (note they all have the *.ipynb extension):

Figure 3: Selecting Desired Files from Windows Explorer

When I enter the Open button (1) that selects all of the files and returns me to the main Jupyter screen, as shown in Figure 4:

Figure 4: Uploading Notebook Files to the Default Directory

As I pick Upload for each selected file (2), the file gets copied to the default directory and the Upload row is then deleted. I continue this process until all of the selected files have been copied.

But this approach still ends up storing my notebook files in a location not of my choosing. If I were only going to poke at this application on rare occasions, perhaps that is OK. Yet my plan is to use notebook files aggressively. So that leads me to the third option of changing the location of the default directory.

There is a two-step process to do so, First, we need to go to the default directory and look for the .jupyter sub-directory (note the preceding period!), which should appear as C:\Users\user\.jupyter. We bring up the command window here and enter and run at the prompt:

jupyter notebook --generate-config

This command creates a new file, C:\Users\user\.jupyter\jupyter_notebook_config.py, which is populated with the various command switches that govern the Jupyter Notebook’s behavior. In its initial condition, this file does not exist, and all settings reflect initial defaults. Once this file is generated, it is now possible to overwrite these default settings. BTW, this file can subsequently be deleted, in which case Jupyter Notebook reverts to its factory settings.

Open an editor for the jupyter_notebook_config.py and search for the following line in the file: #c.NotebookApp.notebook_dir = ''. It occurs about line 265 in the config file. Replace this text with your new desired starting root location for Jupyter. The entry should look like:

c.NotebookApp.notebook_dir = '/the/path/to/home/folder/'

When you make this update make sure you: 1) remove the # character at the beginning of the line and leave no space (the # designates the start of a comment line, and is thus not read at start-up time); 2) you use forward slashes in your path and do not use the tilde ~ character; and 3) you quote the file path in either matching single or double quotes. Figure 5 shows this modified config file for my own installation:

Figure 5: Modify Generated Jupyter Notebook Configuration File

Once you complete the edits, save the file, and then re-start the Jupyter Notebook. Here we see the application now starts up at my preferred directory location:

Figure 6: New Jupyter Notebook Entry Location

We can see we have a new directory structure (1). To open a Notebook page, we only need to double-click the entry (2) and we see that it is now running. Navigation through this tree file structure works as normally.

We will now select the second Notebook page on this list, 02.ipynb, and when we double-click it we get a new page in our browser with the new interactive page. Let’s explore some of the conventions of working with a Notebook page in Figure 7:

Figure 7: Interacting with a Notebook Page

First, beside double-clicking, we may open and close Notebook pages using the old fashioned File option (1). Interactive areas of the Notebook page are shown as cells with the light gray background. The active interaction cell is bounded by the box with the margin highlight (2). To evaluate the cell in this active interactive we may either use the Run button or by using <shift>+<enter> when the cursor is in the active area (3). Also, the text on the page (4) may be entered or edited in the same way. We may either double-click or <shift>+<enter> when our cursor is in these text areas. Text entry occurs using a simple text formatting form called Markdown. We will be using Markdown aggressively throughout the rest of this series, and will have many occasions to describe its formatting and style options.

Like any document, existing Notebook pages can be loaded, modified, and saved again. This enables you to grab useful starting Notebooks on the Web, bring them into your own environment, modify them to reflect your own needs and circumstances, and then save them for later local use or for publishing to others. We can extend the active areas to include entire programs, with many complicated displays and activities possible in our Notebooks. This format is an excellent one for producing interactive dashboards and demos.

When we are done with our Notebook pages we may File → Close and Halt, Quit, which will stop the server and end all active pages, or use a Shutdown button for an individual page. As we shut individual pages, we are directed to a main screen that shows all of the Notebook pages currently active, which we may Shutdown directly (1) as Figure 8 indicates:

Figure 8: Shutting Down Jupyter Notebook

I encourage you to work through many of the examples provided in the three sources listed at the top of this article to gain a feel for Python syntax and the kinds of programming you can do with the language. The three sources at the beginning of this article proceed from simpler to more complicated within each source, and between sources, with the last example being the most comprehensive. The author of the last example suggests it takes about 90-120 hours to work through all of its examples [3]. Of course, simply browsing to gain a feel for the scope of the Python language can be done much quicker.

There are many online resources useful to Python programming. A simple Web search will turn up many Python learning scripts and examples. Here are some additional free ones I have found fairly useful:

Interactive Python Tutorial from LearnPython.org
Digital Ocean’s How to Code in Python3 Series
Jupyter Notebook Tutorial from Google
Jupyter Notebook Tutorial in Python from plotly (all tutorials in notebook form)
Python Tutorial For Beginners – A Complete Guide
Python3 Tutorial (see links in left-hand panel).

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Endnotes:

[1] Hans van der Kwast, 2018. An Introduction To Scripting in Python 3 (https://github.com/jvdkwast/Python3_Jupyter_Notebook), IHE Delft Institute for Water Education, retrieved April 6, 2020. Directory: python3-jupyter-notebook

[2] Zain Mustafa, 2018. Python 3 Tutorial Using Jupyter Notebook (https://github.com/ZainUlMustafa/Python-3-Tutorial-Using-Jupyter-Notebook), retrieved April 6, 2020. This is a re-write of general Python 2.7 tutorials by rajathkmp (https://github.com/rajathkmp/Python-Lectures). Directory: python3-tutorial

[3] Alexander Hess, 2020. An Introduction to Python and Programming (https://github.com/webartifex/intro-to-python), retrieved April 6, 2020. Directory: intro-to-python

[4] These instructions are found in an answer at Stack Overflow from October 31, 2018; see https://stackoverflow.com/a/40514875.

Posted:August 10, 2020

CWPK #11: Installing a Python IDE

This Will Also Test if Python is Working Correctly

You will recall from earlier installments in this Cooking with Python and KBpedia series that one design objective was to include a Python integrated development environment. The two leading IDE options within the data science community appear to be PyCharm and Spyder. Both are natively supported by our Anaconda package manager that we discussed in CWPK #9. My initial investigations indicated that PyCharm, the more popular option, was perhaps a bit more Mac oriented, maybe of more interest to coders and hackers, and perhaps less so from data scientists. PyCharm is one of a suite of well-regarded tools and environments from JetBrains. Spyder, on the other hand, draws its name from Scientific PYthon Development EnviRonment, has a cleaner look to my eye, and a paned layout I find a bit more intuitive.

Of course, the real worth of an IDE comes from its use, and some time of familiarity is always required before a judgment of adequacy can be made. Since my intuition steered me to Spyder, I chose to install it first. If it proves productive, there will be no need to veer from it. If I am not fully satisfied, I can always switch out to PyCharm or perhaps another alternative. This flexibility was a major reason for starting this Python process with the Anaconda distribution.

Since in the last installment we took the steps offered by the Anaconda Navigator GUI to install our Jupyter Notebook environment, let’s take the direct command-line approach for Spyder. Fortunately, we know that Spyder installation is already part of the Anaconda installation, so our task is much simplified to merely starting up and using the application.

To begin our Spyder use, invoke the Windows Start menu and call up the Anaconda prompt option:

Figure 1: Pick Anaconda Prompt from Start Menu

That calls up a command window, where we enter at the prompt, (base) C:\users\mike>spyder. (Your directory structure will differ.) That brings up the basic Spyder IDE, as shown in Figure 2:

Figure 2: Main Default Spyder Screen

However, if this is the first install, you might also get a popup window over this telling you that your Spyder version can be updated. It is important to remember that Python applications (of which both Anaconda and Spyder are examples) invoke many components to operate. These components are often shared by other applications and developed by third parties. Numerous releases of various component parts are constantly occurring throughout the Python ecosystem. One important role of package managers is to query for updates, alert you if components are out of date, and then enable you to update your packages. If you are an infrequent user of Python you will likely get update prompts for various components every time you use the system. Even daily use will see update notices quite frequently. If you see this notice that Spyder is out of date, Quit the program.

If you are only slightly out of date, you likely do not have concerning security holes or obsolete apps. You can proceed without updating. However, it is good practice to always respond to update requests. In this instance of the initial install of Spyder, we decide to install (and get rid of the annoying update screen at start-up). We go to File→ Quit on the Spyder screen. Since we started Spyder at the command line, we return to the command window. (If you had started up with the Anaconda Navigator option, you likely will need to go through the Anaconda prompt steps above to bring up the command window.)

We can first update Spyder alone, and see if that leads to a clean start-up. At the command prompt enter (base) C:\users\mike>conda update spyder. The update facility will be invoked and you answer the prompts in the command window. When you re-start Spyder, however, you may again see the need to update screen. This likely notice is due to the fact that there are many dependencies across your entire Python environment. The better way to answer update prompts is by updating the entire environment. You invoke this more complete updata by entering at the command prompt: (base) C:\users\mike>conda update anaconda. (Your directory structure will differ.) Now, we see many packages needing to be updated and thus a more active update screen, as this Figure 3 example shows:

Figure 3: Anaconda Updater Screen in Progress

When you see the successful completion notice on the update, you can again start the Spyder IDE, now updated and missing the update notice screen.

NOTE: Such updates are a common occurrence when in the Python environment and it is best practice to attend to them whenever seen.

So, now we have an updated, current Spyder and Anaconda environment, with Spyder now invoked. By default at install time, Spyder is configured to use the currently popular dark theme, Spyder Dark. As a personal preference (and it is better for screen captures throughout our CWPK series) I like a lighter theme. To change the theme, we go to Tools → Preferences and then the Appearance option and see a listing of interface specifications:

Figure 4: Changing Spyder’s Appearance

We pick ‘IDLE’ since we know it to be a light one. (There are many loaded configurations you may test as well as create your own.) Once we select and hit the OK button, we are prompted if we want to re-start in order to use the new theme. We accept, and now see a different UI theme as shown by Figure 5:

Figure 5: Main Default Spyder Screen – IDLE Theme

It is always a good idea to start any coding work with a project. We pick the Projects main menu option on main Spyder screen to do so:

Figure 6: Enter a New Spyder (Coding) Project

We give our project any arbitrary name (1) and indicate the location for that file (2). It is a good idea to set up a known, separate directory location for your Python work. In my instance, I chose a new Results directory at the same level as my Python directory. By following this practice you will be better able to find prior work after absences away from the projects.

As you enter this project information, you may have been sharp-eyed and saw that a new pane has been added to the main Spyder screen at the left (2), as shown in Figure 7, which now gives us a chance to explain the panes and functions of the main Spyder IDE screen:

Figure 7: Explaining the Main Spyder Screen and Panes

At the top of the main Spyder screen we have the main menu (1). It is a good idea to systematically move from menu item to item to see the functionality included in the Spyder IDE. Immediately below the main menu is the Toolbar. Again, you may mouse over each icon to get a tooltip of what functionality it represents. Note the icons are provided in groups, with debugging options provided in the middle, for example.

The leftmost pane (2) is the Projects pane. As your projects grow and you organize them, they are listed here in the traditional directory tree structure. Right-clicking on a project enables you to open, delete, use, etc., the project.

The main code editor is found in pane (3). Syntax highlighting and code completion assists of various natures may be invoked here, similar to any modern code editor. This is a window in which we will spend much time throughout this CWPK series. A general help area is provided in pane (4). You can pull up the general help, and get various prompts or assistance. In context, we will touch on some of these in later installments.

The console (5) is an interactive environment that shares the same iPython core with the Jupyter Notebook. Individual statements can be tested and run using REPL. Also, we can highlight code blocks in the editor (3) and they will run in this pane.

Notice that the horizontal pane separators may be moved. Each pane also has an upper right icon () that provides contextual options for that pane, including the standard options of Close or Undock. If you choose the Undock option, the pane becomes floating. To return it to its original position, chose the option to Dock. Note if you close a pane, you may open it up again via the View more Panes options. The ones we have active in this initial installation are shown in Figure 8 below.

You will notice that there are many possible panes in Figure 8 that are not shown as active. But, also notice we do have a History pane that is active that we have not yet discussed. When there are more panes selected for view for which there is not adequate screen space, multiples will appear in a given frame area as sub-tabs. If you look closely at the lower right of Figure 7 you will see this additional History tab (6). We also see some status information at the lower right in Figure 7 (6) that tells us what version of Python is presently loaded, line and character positions in the editor (3), etc.

Figure 8: The View – Panes Options

If you only work occasionally with Python perhaps these options are enough to give you a fast configuration that meets your needs. Much of the stuff behind the scenes in Spyder, however, like other IDEs, is devoted to allow constant users to tweak the system to exactly how they want it.

OK, so now we have gotten a bit familiar with Spyder and have configured it a bit to our liking. It is now time to write our first program. We will use Figure 9 to illustrate this process:

Figure 9: Writing a Simple Program

As with our previous lesson with Jupyter, we enter the statement print("Hello KBpedia!") (1). Note that as we enter statements, we get some autocomplete suggestions for our print statement. Once we have completed our statement, we can run this cell (3) from the Toolbar, which is another way of saying to run the current line, and its results appear in the console (4). Note we could just have easily entered the same print statement in (4) and gotten the same result evaluation. The difference, of course, is that statements entered into the editor are part of a conventional Python (*.py) source program as opposed to a single interactive statement. The same Toolbar where we ran the cell also allows us to run code blocks or the entire source program depending on the icon we select.

Now that we have completed our first file, it is time to save it, as shown by Figure 10:

Figure 10: Saving the Code File

The system allows us to save the code file anywhere, but again best practice is to enter the file name (1) under the same directory as our project location (2), which enables it to show (1) in the project pane. We then File → Quit to exit the program.

We will learn much about the Spyder IDE as we move forward. Here are some additional resources to learn about Spyder further:

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Posted:August 7, 2020

CWPK #10: Installing a Project Notebook

This Will Also Test if Python is Working Correctly

In the previous installment in this Cooking with Python and KBpedia series, we installed Python via the Anaconda distro and package manager. The GUI portion of that package, Anaconda Navigator, was part of that install. In this current CWPK installment we will use Navigator to launch the Jupyter electronic notebook, and then write a simple program to demonstrate that our Python installation is installed properly and working. In the installment after this one, we will next install a Python IDE using the command line to demonstrate the second way we can interact with this package.

Though there are a variety of electronic notebooks that may work with Python, we have chosen Jupyter (nee iPython) because it is the oldest of Python notebooks, the most used, and the most capable. Like all of the packages we utilize in this CWPK series, the Jupyter notebook is open source. Two supplementary sources to this article are the Jupyter instructions on KDnuggets and or installing and testing the notebook.

We can launch Navigator either via the command line or directly from the application. We use the direct approach in this installment. We start Anaconda Navigator via the Windows Start button, and then expand the listing for Anaconda3 and pick Anaconda Navigator from the menu:

Pick Anacondna Navigator from Start Menu

Figure 1: Pick Anaconda Navigator from Start Menu

(Alternatively, you could pick ‘Anaconda Prompt’ from this menu and type the command anaconda-navigator.)

The first time you start Navigator, you will see a splash screen introducing the app and asking if you would be willing to share usage data. I picked Yes and then OK, and don’t show again:

Figure 2: Initial Anaconda Navigator Splash Screen

We will thus not see this screen again at start-up.

When invoked, we get this main (Home) launch screen for the Navigator:

Figure 3: Main Launch Screen for the Anaconda Navigator

We pick the Jupyter Notebook (1) from this list.

This launch then brings up a new Web page for the Jupyter Notebook application, as shown in Figure 4 below. (Note: if this step works, that means your Python installation from CWPK #9 is working properly.) Jupyter notebooks are presented as interactive HTML pages. This initial entry page is set by default to your user Web page. (We will later discuss how to set this to a different location.) The convention for Jupyter notebook files is *.ipynb. To start a new notebook, pick the dropdown menu labeled ‘New’ (1) at the upper right of this screen. You may then create a new Notebook with the Python version you installed:

Figure 4: Jupyter Notebook Entry Screen

Here is the new starting Notebook entry screen:

Figure 5: Jupyter Notebook Working Screen

You may then rename your Notebook by clicking on a current name and editing it or by finding a name under File (3) in the top menu bar.

Now, in this simple example, we enter the statement (2) of print(“Hello KBpedia!”). That statement gets evaluated in real time (via the REPL loop previously mentioned) by pressing the Run button (1) into the result of ‘Hello KBpedia!’ [shown at bottom of (2)].

When done with this application, you exit by going to File at the upper left (3) and picking Close and Halt. That will return you to the main screen of Anaconda Navigator, where you may File → Quit to back you out of all programs. Should there be any open applications, you have the choice to close them as well at that time.

We will be working with Jupyter much throughout this CWPK series, and will thus have many opportunities to see other aspects of working with these notebooks. If you wish to learn more about notebooks, here are some resources:

Jupyter notebook for beginners
Jupyter notebook documentation
Interactive notebook widgets and ipywidgets
Best of Jupyter for Data Science
JupyterLab is the next-generation user interface for Jupyter.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Posted:August 6, 2020

CWPK #9: Installing Python

Trying to Take a Good First Step

We have spent a week and one-half setting the table and clearing our throats. It is now time to begin putting software into action. The complement to our KBpedia environment is the one for Python. We now switch gears to finding and installing a basic Python ‘starter package’ for this Cooking with Python and KBpedia series. We will approach this question from the standpoints of our local Windows 10 operating environment, the needs for KBpedia and its tools, and our desire to move into data science and machine learning applications.

Though I am an absolute newbie with regard to Python, I have been monitoring it for some years as a possible language to adopt. For quite a few years there was apparently a lengthy and difficult transition from Python 2 to Python 3. (As of this writing, the current version is Python 3.8.5.) Many of the issues we will address in coming installments deal with questions like encodings, management of CSV files, file I/O, and other topics that have apparently been much easier to handle with Python 3 (specifically since Python 3.6 and 3.7). Because of this transition, one may still find tutorials and online guidance offered in both Python 2 and 3 flavors. Fortunately, since all that we are developing in this series is new, we do not need to account for a legacy code base. We are thus free to not have to provide duplicate Python 2 and 3 routines. Please note, however, if you are using Python 2 locally that likely most of the routines offered during this series will not run without an upgrade.

Going back to the days of the LAMP stack on a Windows machine, which I did years ago with the XAMPP package, I am leery about installing languages and complicated stacks on Windows. Unlike Linux packages that can be installed with a single command, and easier methods on Macs as well, my impression is that Windows has never been a particularly friendly environment for installing natively non-Windows systems and applications. Though the official Python site has some impressive documentation and installation kits, including some nice kits from third parties, I decided going in that I wanted to use a more automated approach to handling Python and related package installation and dependencies. The dependencies portion is especially tricky since many data science applications with Python build on other packages, and getting them installed in the correct order with appropriate settings can be a real time waster.

My initial research suggested I wanted a data science focus for my Python installation with machine learning, of course. But, I also thought it might be a good idea to install an integrated development environment (IDE) to aid code completion, language lookup, and debugging. I also had become quite enamored with the REPL (read-eval-print-loop) capabilities in our Clojure work, which allows code snippets to be interpreted and run immediately, so I also was quite intrigued with adding an electronic notebook environment as well. My initial research suggested either the PyCharm or Spyder IDEs might be suitable for data science. I had already been following development with the Jupyter notebook (initially iPython) for quite some time and wanted to include that in my platform as well.

So, while I was beginning to zero in on a suite of Python-based tools, I had not worked directly with any of them. I therefore also wanted a Python package installer that was flexible enough to enable me to choose among alternative tools and switch them out if need be with relative ease. Who knows what kind of speedbumps I might encounter as we continue on the journey in this CWPK series? Prior experience with Eclipse and other IDEs warned me that component might especially be one that required some choice and flexibility.

There are a number of guides for installing Python locally that are quite good. One is geared around Jupyter and uses PyCharm as the IDE. I did not really like the lock-in on the IDE and also did not like the suggestion to immediately incorporate GitHub into the workflow. (GitHub is essential for anyone hoping to share code with others or develop an open-source package, but my target audience of the focused newbie may not require this step.) Other guides, however, for examples the ones from KDnuggets or DataCamp, kept pointing me to the open-source Anaconda installer package.

Anaconda is a package manager, an environment manager, and a Python distribution that particularly emphasizes data science applications. The environment enables one to quickly download more than 7,500 Python/R data science packages (so, it also supports incorporation of R; see CWPK #13), including the essential ones of machine learning in scikit-learn, TensorFlow, and Theano, plus the data analysis tools of Dask, NumPy, pandas, and Numba. Multiple data visualization packages of Matplotlib, Bokeh, Datashader, and Holoviews can also be managed. Anaconda enables one to manage libraries and their dependencies on Windows, MacOS and Linux using the Conda package manager. Conda complements the standard pip Python package manager. More than 20 million users worldwide use the Individual Edition version of Anaconda.

The final factor that convinced me to use Anaconda was its Navigator graphical user interface that enables one to launch applications or manage packages. Navigator showed me a way to easily choose Jupyter, PyCharm or Spyder, three of the big options at the center of this work, among many other package choices down the road. Thus, while I may prefer working directly in an editor via an IDE when writing programs, having the option to manage apps and dependencies in a GUI is really attractive.

NOTE: To mirror this installation, you will need 64-bit Windows 10, 8 GB of RAM, and at least 7.0 GB of free disk space.

Installation of Anaconda is a breeze, though I do add one important wrinkle to the fully automated path.[1] Begin by going to the Anaconda Individual Edition download page and pick Download:

Figure 1: Initial Anaconda Install Screen

When presented with the alternative approach, choose it. While it is nice to have the Anaconda GUI available for some tasks, we also want to work with Python at the command line without needing to invoke Anaconda. The alternate path approach writes the appropriate path information to the Windows environment variables, meaning we can invoke Python and related applications from any location on our local machine.

Your first actual install screen will ask about Advanced Options. Make sure and pick the Add Anaconda3 to the system PATH environment variable. When you do, you will get a red font notice telling you this option is not recommended. Proceed with it anyway by picking Install:

Figure 2: Pick the ‘Not recommended’ Anaconda Install Option

You will then have the choice to install as a single user or for the entire machine. In my case, since it is a dedicated computer not on a business network, I use the entire machine option. Your local circumstance or IT department may mandate the single user option.

You should pause at this point and think through what you want your directory structure to be. You can accept placing all files in standard install locations (assuming a C: drive) of C:\Users\mike\anaconda3 if you set it up for you as a single user, or C:\ProgramData\Anaconda3 if you selected to install for all users. However, I find it useful to set up my own directory structures and be able to modify and expand at will. A Python installation with Anaconda will take substantial space, and you may be setting configurations for many different apps as well as projects other than KBpedia or derivatives. For security purposes, we also want to keep our use of Python somewhat fenced and unable to reach the root. (We’ll discuss this again in CWPK #15.) Here is how I am setting up my directory structure:

|-- PythonProject                         # Set up a 'master' directory for all of your Python work; name as you wish 
     |-- Python                           # Create a 'Python' (upper or lower) directory under this root
           |-- [Anaconda3 distribution]   # Direct your Anaconda install to this location
     |-- TBA
           |-- TBA                        # We'll add to this directory structure as we move on
     |-- TBA
           |-- TBA

The entire install process is automated by following the instructions on each screen. During the actual install process you may click the Details button and watch progress as it proceeds file by file. The longest step is setting up the package cache. Overall the installation takes a few minutes. Please be patient. At the conclusion of the installation and feedback to the screen, proceeding through all steps as presented, we conclude by clicking on Finish.

To check to see if the install proceeded properly, call up a command prompt (Powershell or cmd at the Windows Run option), and see if you get version information for these two commands:

conda --version
python --version

If you get version information echoed to the screen, you are fine. If these commands do not work, see the ‘Add Anaconda to Path (Optional)’ section in DataCamp.

If you have Windows issues, you can inspect the general Python Windows use guide or look into these possible issues after install.

For additional information about Anaconda, see the quick start user guide or the getting started tutorial (requires registry).

The ease of installation and package management with Anaconda does come at the cost of some bloat, as well as higher memory requirements for what gets loaded at start up. For the former problem, it is possible to replace Anaconda at a later date with Miniconda and conda, especially as your environment has stabilized and you have less frequent need for Navigator package manager. As for memory management, if your machine is only marginally capable, you may want to look into these scripts or these other scripts.

NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.

Endnotes:

[1] The installation guide here is a capture of the Anaconda install process. Earlier online guides with slightly different narratives may be found first at DataCamp and then at KDnuggets.

Main Links

Search

Author: Mike Bergman

Posted:August 12, 2020

CWPK #13: Managing Python Packages and Environments

Keeping Multiple Interacting Parts Current

Endnotes:

Posted:August 11, 2020

CWPK #12: A Brief Pause to Learn Some Python

Let’s Get More Familiar with Python Syntax and Jupyter

Endnotes:

Posted:August 10, 2020

CWPK #11: Installing a Python IDE

This Will Also Test if Python is Working Correctly

Posted:August 7, 2020

CWPK #10: Installing a Project Notebook

This Will Also Test if Python is Working Correctly

Posted:August 6, 2020

CWPK #9: Installing Python

Trying to Take a Good First Step

Endnotes: