Hardware and Software Codesign, 2020 (a.a. 2019-2020)

This is the page for the "Hardware and Software Codesign" course at Università di Pavia, academic year 2019-2020. It started in March 2020.

Cloning the pages

This tree, excluding the big mp4 files, can be cloned on your system; then you could "git pull" for updates:

   git clone git://hsw2020.gnudd.com/hsw2020-www.git

The topics

The material is now organized in chronological order. Each 2-hours lesson has a sequential number, starting from 01. The introduction is 0. When something is missing, a FIXME appears.

For each lesson, there are slides and/or videos, for the first part of the lesson, but the latter half is done in videoconferencing (Jitsi or Big Blue Button). So material here is around an hour worth (usually a little less) for every lesson.

The videos are 15-20m each. Specific length is listed here below for each of them. They are on Youtube, feel free to share the link, because I (almost) don't swear when I record myself. Please choose "720p" resolution when I show examples on the terminal or editor.

Introduction

This is an introduction to the course. Slides are not published, as I consider them irrelevant. The video is both as a link to Youtube and the original file to download.

00-intro: video 16:26
00-intro.mp4: mp4 60MB

Git

The first lesson is about git. Everything I do here is committed, either in the www tree (these pages and small exercises) or in the separate project.

I suggest to read an article I published in 2010, but it's still current. I translated it from Italian in a hurry, and it's horrible. I re-read and fixed the first half, I'll complete it sooner or later.

An Italian version of the same (much more polished in language) is on Linux.it in my ancient personal pages, but I have no access to them currently.

Then, three videos. Again, Youtube and original file.

Advanced Git Use: Article
01-1-git: video 18.04
01-1-git.mp4: mp4 66MB
01-2-git: video 16:21
01-2-git.mp4: mp4 73MB
01-3-git: video 18:03
01-3-git.mp4: mp4 84MB
Git slides: as a web page
Git slides: as a white-background pdf file

C Language

We are using C language. If you want a book I suggest K&R, or you have more references at the end of my short document linked below.

We are not using floating point nor standard libraries and so on, because the target is microcontrollers. We use the linker massively, so even if you feel confident with C I suggest you take a look here.

For A-C-X and A-C-X-more there is an Italian version on Linux.it.

A-C-X: Short summary of the language
A-C-X-more: Add-on information
Operators: the C operators (ugly ascii)
02-1-clang: video 17:12
02-1-clang.mp4: mp4 59MB
02-2-clang: video 18:25
02-2-clang.mp4: mp4 69MB
02-3-clang: video 20:55
02-3-clang.mp4: mp4 77MB
povacca: the reverse-polish notation example
exercises-1.txt: about git and elf sections

Cross Compilation

This lesson is about cross compilation, but it also includes a description of binutils and inline assembly. The example shows use of inline assembly, multi-arch stuff, weak functions and volatile.

There also is a package, with documentation, about making your own ARM toolchain for microcontrollers. I used it to build various versions, and it works. You can clone it here:

   git clone git://hsw2020.gnudd.com/arm-toolchain.git

If you have TeX and Texinfo you can build the documentation in doc/. The current pdf output is committed here in the web pages too.

Cross Compiling Slides: as a web page
Cross Compiling Slides: as a white-background pdf file
03-1-crossc: video 19:45
03-1-crossc.mp4: mp4 63MB
03-2-crossc: video 19:51
03-2-crossc.mp4: mp4 81MB
03-3-crossc: video 18:40
03-3-crossc.mp4: mp4 83MB
03-4-crossc: video 16:03
03-4-crossc.mp4: mp4 57MB
crossc: the example (but please see the commits)
arm-toolchain.pdf: Pdf documentation for the toolchain script

Kicad and the TDC

This is about kicad. It is not a tutorial, as many are there, but a quick overview and a list of things that I consider useful but are not usually described in tutorials.

Unfortunately, recording stopped one minute before the end, so tha last part is missing, but it's not very important, so I won't redo it all.

The second part is about the TDC (time to digital converter) device that we'll use during the course. I designed this at the end of 2018 for a conference, where I had two 45m slots, to discuss hardware and software. Recording of the conference were never published. I used the material at the "End Summer Camp" of 2019, where I shrunk the most important concepts in 28m. So this is the whole HSW2020 course in half an hour. The video is available, thanks to Federico Sgobbi and the rest of the ESC team. I spoke Italian because there were no foreign attendees at the conference.

The repository of the TDC is out there, but I'd better introduce things one at a time during the course. So there is a local repository, where hardware is commited right away (by cherry-picking from my other repo) and software will be added during the next lessons. Please pick a copy:

   git clone git://hsw2020.gnudd.com/hsw2020.git

Warning: the TDC was made with an old version of kicad, When you run eeschema you must click "Ok" to the "Project Rescue Helper" dialog. No incompatibility is there for pcbnew.

Kicad Slides: as a web page
Kicad Slides: as a white-background pdf file
04-1-kicad: video 17:30
04-2-kicad.mp4: mp4 83MB
TDC Slides: as a web page
TDC Slides: as a white-background pdf file
04-2-tdc: video 14:13
04-2-tdc.mp4: mp4 56MB
TDC at End Summer Camp (in Italian): video 28:27
tdc.pdf: the schematic page for TDC
exercises-2.txt: design challenges

Qemu and VersatilePB

We are now moving to qemu-arm, in order to be able to run some code without the need for real hardware. These videos, with associated commits in the repository, perform this step. You can run the code on your PC, provided you install the cross-compilers and so on.

Code for hell-arm is in the hello subdir of hsw2020-www, code for the versatile example is in the versatile subdir of the same project.

As a reference for this initial approach to code, please consider reading the Thos document (Two Hour Operating System). Where these concepts are explained -- but Thos runs on real hardware, similar to the TDC, so some things will be introduced later on during the course. The thos repository is here:

   git clone https://github.com/rubini/thos.git

There is also a boring video of me writing Thos. 90 minutes.

05-1-qemu: video 13:35
05-1-qemu.mp4: mp4 59MB
05-2-versatile: video 18:53
05-2-versatile.mp4: mp4 71MB
05-3-versatile: video 18:41
05-3-versatile.mp4: mp4 68MB
05-4-jiffies: video 15:37
05-4-jiffies.mp4: mp4 63MB
thos.pdf

Thos: video 90:01
exercises-3.txt: design challenges

The TDC Kit

This is the TDC kit that we'll use during this course:

It includes:

The TDC PCB, with an LPC11U35 microcontroller (64kB flash, 8kB RAM)
An usb-uart adapter, based on a CP210x chip
A mini-USB cable: I prefer mini because of mechanics
The power-supply connector, for those who can feed 5-12V
10 male-female and 10 female-female dupont wires

The green connector says "12V", but anything from 5V upwards works. You may note that I use a 7805 to make 5V, and the component wants at least 7V. Fact is, 5V are enough, because the final uC voltage is still within allowed limits. In this case, we should sell it for 7-12V, but in lab we know 5-12V is ok. More than 12V risks exceeding the 7805 power budget, because I didn't design anything for heath dissipation.

Please note that if you power the board with the USB cable, it will enter USB-programming. We will use it, but later on. To use serial-programming you must power the device otherwise. If you can't feed the green connector with 5-12V, it's possible to power it through the serial adapter, using the 2-pin strip that lives near the green connector. Pin 1 (double-border) is GND, and there are - and + markers nearby, on the sikscreen.

Please be careful about polarity. the positive goes to +, and negative (GND) to -. Otherwise you'll emit smoke (I didn't try, but I won't be surprised).

The UART pin-strip is GND-TX-RX, in this order. In the figure above we don't need to route GND to pin 1 of the UART connector, obviously. If in doubt pleas ask me. As a general rule, I always route GND to pin 1, because both 1 and GND are the ones that are never missing. The "PRG GND RST" strip is an exception, for mechanical reasons, as explained below. There, for clarity, all three pins names are marked on silk-screen.

The easiest way to force re-programming, is ticking RST low while keeping PRG shorted to GND, with your tweezers, as shown in the next figure, and this is why GND is the middle pin.

Running on real hardware

We now move to the real TDC device. We start with the hello program, using the real hardware timer. We load the program to RAM, because it is simpler than loading to flash. The code structure (in the hsw2020 repository) is more complex than what I did in Thos, because I we'll need to expand this code base. Thus, the final code layout is already in place since the first commits.

There are a few questions asked, as usual, because I want people to think about their code and their bugs. Please note that "program", in the tools directory means "prog-ram", as opposed to "prog-rom" (which programs to flash, or rom).

06-1-program: video 18:55
06-1-program.mp4: mp4 83MB
06-2-jiffies: video 11:22
06-2-jiffies.mp4: mp4 46MB
06-3-udelay: video XX:XX
06-3-udelay.mp4: mp4 36MB
exercises-4.txt: food for thought

Moving Code to Flash

The next step is moving our code (the simple "hello" thing) to flash memory. This involves a little knowledge of the ARM processor and its peculiarity (in particular, the fact that Thumb code appears to live at odd addresses).

After learning a little of our CPU, we move to flash memory, writing a new linker script, a new boot.S and (unfortunately), also the code for UART initialization. When we run from RAM, we rely on the internal boot loader for setup of the serial port (which is used to upload code). When running from flash we are all alone and must deal with all fine details.

In the third video of this lesson, we activate the system PLL. to run the CPU at 48MHz, with the help of the external oscillator. The code is not especially interesting, what I'm interested in, as usual, is the underlying ideas and problems, not the implementation.

Finally, we add Kconfig to the software project, in order to be able to build for either Flash or RAM, choose our CPU frequency (sometimed you want it faster, sometimes you are more concerned about power consumption and want to run slower). Also, this allows to add optional features later on.

ARM slides: as a web page
ARM slides: as a white-background pdf file
07-1-arm: video 19:24
07-1-arm.mp4: mp4 80MB
07-2-flash: video 19:35
07-2-flash.mp4: mp4 115MB
07-3-pll: video 11:19
07-3-pll.mp4: mp4 65MB
07-4-kconfig: video 14:15
07-4-kconfig.avi: avi 56MB
exercises-5.txt: please evaluate

GPIO and printf

We are now adding the GPIO API to our system, and a GPIO example that is ticking the leds. The example is not described but is part of the repository. The GPIO subsystem is a basic block that must be simple to use and self-explanatory, without requiring the user to know register names.

Then printf, another basic building block that we can't live without. We pick an external printf, one that I wrote specifically for an embedded project that was in dire need for a good printf implementation (actually, it's not really mine, I got it from smarter people). What is especially interesting in pp_printf is that it is modular: you can choose at build time your tradeoff between features and size. We currently just pick the more featureful.

The printf video also deals with how to pick an external package in you own in the most maintainable way (short of using submodules, a more difficult topic we won't cover).

With this in place, we will find that all example programs became way too big (almost 4kB), and we'll be looking for the solution. This is not in a video, though: we discuss at lesson time and you will find the associated commits as last ones before the "lesson-08" tag.

GPIO slides: as a web page
GPIO slides: as a white-background pdf file
08-1-gpio.mp4: mp4 40MB
08-1-gpio: video 14:15
08-2-printf.mp4: mp4 37MB
08-2-printf: video 12:47
exercises-6.txt: as usual: some easy some not

Processes

Now we talk about processes. First a mimum of the standard scheduling theory, then the simplest implementation ever, that we use in our repository. As usual, some questions are asked. And I'm serious about those, as usual.

Note:The scheduler as shown in the slides calls t->job instead of best->job. This is now fixed, and it was not the bug I referred to, which is more subtle.

The third video of this set is about the implementation of udelay, in a way that is much better than counting microseconds. This is actually the state of the art -- but I may leave some subtle bugs at this point (nothing blocking, just minor details). If the thing is not completely clear at a first view, please consider relying on the commit message rather than the video.

Slides about processes: as a web page
Slides about processes: as a white-background pdf file
09-1-processes.mp4: mp4 45MB
09-1-processes: video 19:17
09-2-processes.mp4: mp4 35MB
09-2-processes: video 14:59
09-3-udelay.mp4: mp4 32MB
09-3-udelay: video 12:37
exercises-7.txt: as usual: some easy some not

I2C and SPI

The repository now includes a number of features more: the timestamp_get functionality, sscanf and an interactive shell. So we can finally play with hardware (the shell can read and write all machine registers, and the gpio command allows reading and writing bits.

So we now move to I2C and SPI, the most important peripheral busses in the microcontroller world.

Slides about I2C and SPI: as a web page
Slides about I2C and SPI: as a white-background pdf file
10-1-i2c.mp4: mp4 48MB
10-1-i2c: video 16:06
10-2-spi.mp4: mp4 55MB
10-2-spi: video 19:58

Interrupts

We are now introducing interrupts. A single-interrupt TDC implementation is already pushed in the project's repository.

Slides about interrupts: as a web page
Slides about interrupts: as a white-background pdf file
11-1-irq.mp4: mp4 44MB
11-1-irq: video 16:09
11-2-irq.mp4: mp4 36MB
11-2-irq: video 14:28
11-2-irq.mp4: mp4 75MB
11-3-irq: video 18:50
exercises-8.txt: related to interrupts

USB and OneWire

We now cover two more communication busses: USB and OneWire. The former is known to most, and is supported by our TDC board, which can act as a USB slave device. The latter cannot be demonstrated on real hardware (unless we plug a thermometer in the UEXT connector) but is interesting nonetheless.

Slides about USB: as a web page
Slides about USB: as a white-background pdf file
Slides about OneWire: as a web page
Slides about OneWire: as a white-background pdf file
15-1-usb: mp4 49MB
15-1-usb: video 16:28
15-2-usb: mp4 46MB
15-2-usb: video 16:11
15-3-w1: mp4 40MB
15-3-w1: video 16:38

Data Structures

It's high time to introduce lists and trees. I'm more interested in lists than in trees, considering the small environment of the microcontroller world.

We have slides and two videos, but first of all the article about lists and treees (in Italian, I'm sorry) and the related source code. The subject matter is the Linux kernel, but the target is picking lists (and trees) from there to use them elsewhere. Please stay reassured, I'm not the only one who did it. This approach to data structures is already in use by some real programmers.

Data Structures in the Kernel (article, Italian)
Code for article and slides
A set of malloc implementations with lists and trees
Slides about lists+trees: as a web page
Slides about lists+trees: as a white-background pdf file
16-1-list: mp4 39MB
16-1-list: video 14:24
16-2-rbtree: mp4 45MB
16-2-rbtree: video XX:XX
rbmalloc.tar.gz: RB-malloc sample code

Here is a playlist with the two videos for this lesson.

Preemption

The three videos basically describe the commits that lead to the "lesson-17" tag in the hsw2020 repository. And there is a new set of questions.

17-1-preempt: mp4 80MB
17-1-preempt: video 20:10
17-2-irq-thread: mp4 75MB
17-2-irq-thread: video 19:53
17-3-irq-thread: mp4 30MB
17-3-irq-thread: video 8:35
exercises-9.txt

Here below you find a playlist with the three videos of lesson 17. The size of the frame is exactly the size of the virtual screen I recorded, so it should be "as good as possible". Blurring is due to compression.

Kconfig and bugs

When and how to configure, and when and how to error out. Some of this was already hinted-at previously, it's now all together. Part of this is opinions, you are allowed to disagree and argue about it.

Slides about config/bugs: as a web page
Slides about config/bugs: as a white-background pdf file
18-1-config: mp4 50MB
18-1-config: video 18:35
18-2-config: mp4 45MB
18-2-config: video 16:19

Here below you find a playlist with the two videos.

Initcalls and BUG

We are now talking about initcalls, something I inherited from the Linux kernel. It was a long-standing issue with our setup.c file, now solved. Unfortunately, I had first to reshape the build system of the code, so the second video visits back the linker to explain the build options. I also offer an implementation of "BUG" (again, idea and code hints from the Linux kernel), and this is described in the second video too.

Slides about initcalls, linker, BUG: as a web page
Slides about initcalls, linker, BUG: as a white-background pdf file
19-1-initcalls: mp4 39MB
19-1-initcalls: video 14:08
19-2-linker-bug: mp4 39MB
19-2-linker-bug: video 13:37

Malloc and Random

The first thing here is a description of the commits that lead to a simple fix in the initcall implementation. The main point is how to setup your commits so the users are happy. The fix itself is trivial, after the preparing commits are there.

Then we talk about malloc. I am always against having malloc/free in microcontroller systems, because they awake the "out of memory" risk. Still, there are some corner cases where it might be useful, so I take it as an excuse to show use of <list.h>.

Unfortunately, list insertion and removal is not atomic, so we need a mutual-exclusion primitive if the system allows preemption (and our does). So first we need to introduce a mutex, and this is the simplest possible, with the downside that a task can only take one mutex (expanding the code to support several mutexes is left as an exercise for the reader).

The resulting malloc code is 720 bytes, including initialization and some extra-check that might be removed, so it might make sense to use, sometimes. Performace-wise, on this 48mHz microcontroller, it takes something like 12 microseconds to allocate and 5 to free.

Finally, we introduce the subject of random and pseudo-random numbers, and important topic that is often ignored. No, I'm not a cryptologist, the implementation here is really basic, but sufficient for most uses. We even have an hardware random number generator to be added to all microcontrollers that lack it, at the cost of one cent.

No slides this time, only commits and a guy discussing them.

20-1-git-initcall: mp4 51MB
20-1-git-initcall: video 13:36
20-2-malloc: mp4 88MB
20-2-malloc: video 22:19
20-3-random: mp4 45MB
20-3-random: video 11:11

Function pointers

Lesson 21 was mainly a review of function pointers, with no slides nor videos: we did it as an interactive-only session.

We reviewed the irq_request() implementation, which is one of typical use cases of function pointers. There, we have an array of possible implementations, and the right one is called by accessing an array of function pointers. The same approach is sometimes useful to demangle a switch() construct, for example in the back-end of a state machine.

We then looked at how we could implement the concept of alternate device drivers by using an "operations" structures, where the function pointers are, in a way, the methods that act on the object.

The commits in branch "lesson-21" (which will become a tag within master) first add two different implementations of a bit-banging I2C driver, and then in the last commit show use of a set of i2c_operations to unify the higher level interface and allow the application a choice of which driver to use. Adding a driver for the microcontroller-specific logic cell is now trivial.

Storage and filesystem

This lesson is about storage and filesystem. I offer slides and two videos of the usual duration. Additionally, here you find a copy of the SDB specification, and a set of exercises.

Filesystem slides: as a web page
Filesystem slides: as a white-background pdf file
22-1-fs: mp4 37MB
22-1-fs: video 13:45
22-1-fs: mp4 40MB
22-2-fs: video 13:43
SDB Specification
exercises-10.txt

Network support

Networking. The code is committed, and here is some description of the reasoning behind it.

23-1-network: mp4 46MB
23-1-network: video 16:20
23-2-network: mp4 62MB
23-2-network: video 16:41

Lesson 24: Power

We are mainly talking about hardware here. I also covered licensing in the lesson.

No videos (or not yet, I must think about it). Slides only.

Power slides: as a web page
Power slides: as a white-background pdf

Lesson 25: Debugging

This is the last lesson, and we talk about debugging. In the code repository you find a d working gdb stub, and the commit messages describe what works and what does not work. I find it quite satisfactory, considering the time it costed.

The slides describe the various aspects of debugging (well, more can be said, but time is limited). And the videos cover the slides and the gdb stub example.

Debug slides: as a web page
Debug slides: as a white-background pdf
25-1-debug: mp4 38MB
25-1-debug: video 15:10
25-2-debug: mp4 54MB
25-2-debug: video 18:50
25-3-debug: mp4 46MB
25-3-debug: video 16:14

All exercises in a single html file

exercises.html

Exam

The exam happens on the official dates (time TBD). The exam session lasts 2 hours, and you are asked to work on 4 questions and 2 exercises. You'll be able to choose your questions and exercises among a set of them.

In the questions, you are expected to describe with your own words one technical point. Answers must be as tidy and to the point as possible (i.e., no handwaving). You choose your questions, you must know the answer. Expected time span: 5-10m per question.

The technical exercises are serious problems. You must make your design choices and write the code. The complete, "perfect", solution would take 4-12 hours, while you have 40-50 minutes for each exercise. If you think it's done in 20 minutes, it is not, please proceed further. You are expected to split your time among the two exercises and use it all to deliver the best code and documentation you can.

If what you deliver does not work, do not despair. The problems take more than 40m each to be solved, and I know it. This is what I value in the deliverable (nothing new, if you attended to my classes):

Clarity of the design choice. A short README (a few lines) may help;
Clean and maintainable code;
Good error management;
Informative, but not over-verbose, comments;
Attention to both CPU time and code size;
If it's part of the problem, a simple and solid circuit schematic (this can be draft on paper and then attached as a photograph);
Informative and to-the-point commit messages;
A sane split of the solution in several commits.

You are asked to keep the camera on so I see you: you are not expected to talk. No other surveillance is there (I do not spy your screen). Yes, you can look for information whenever you want; you are just not expected to talk. And no, you are not expected to look at the conference tab in the browser. If I have something to communicate, I'll talk to all of you in the conference (or write something and then get your attention by talking).

If anything is unclear, or if you encounter problems during development, please make a phone call and we sort it out as quickly as possible, yes, you are expected to talk during the phone call with me.

If I suspect your exam is not yours, we'll have an oral session. It only happened once in past years, I'm confident it won't be needed.

To get the minimum positive grade, you must give almost-right answers to all the questions you choose and show you are able to approach both exercises you choose. From that to the maximum grade, the evaluation is split equally among questions and exercises. To get "cum laude", you must side a good exam session with a good project work.

Project Work for the Exam

As already hinted since time 0, in addition to the exam in the official dates, you are expected to deliver a programming work (possibly with some simple circuit design too). I am expected to be queried during development, so please do not follow wrong or uncertain paths without asking: your time is as valuable as mine, and you should not waste a day because you are shy in making a phone call or write me your doubts. If my phone is not responding, please email or send a message to my data number, and I'll get back to you.

This is a list of suggestions for the project work. You can pick one or make your own proposal: I'll most likely accept it.

Sampling a termocouple (see exercise 10.1)
Drive a string of Neopixels using SPI
Write the I2C driver for the LPC logic cell

Alessandro Rubini

Last modified: May 2020