Sunday 2 July 2023

Exploring Python and OpenAI: Unveiling Strengths, Limitations, and Collaborative Potential

 Disclaimer: The following code and text was written in part by AI, OpenAI to be precise

Title: Exploring Python and OpenAI: Unveiling Strengths, Limitations, and Collaborative Potential

Introduction:

In this blog post, we delve into the realms of Python programming and the intriguing capabilities of OpenAI as a programming companion. Our objective is to understand the effectiveness of OpenAI in assisting with Python programming while also uncovering its limitations. By exploring this intersection between human programming skills and AI-powered assistance, we aim to gain insights into the collaborative potential of AI in programming.

Learning Python and Assessing AI's Capabilities:

The primary goal of this exercise was to gain proficiency in Python, a widely-used and versatile programming language. Concurrently, we sought to evaluate the effectiveness of OpenAI as a tool for programming guidance. With an open mind and an eagerness to understand the interplay between AI and programming, we embarked on this exploration.

Understanding OpenAI's Strengths and Limitations:

As we engaged with OpenAI, we discovered its value in providing general assistance and suggestions, particularly for beginners. However, we also encountered limitations when faced with more complex coding challenges. These instances highlighted the need for human intervention and expertise, underlining the fact that OpenAI cannot fully replace the role of human programmers.

The Collaborative Potential:

Despite its limitations, OpenAI showcased its potential as a collaborative tool. We completed coding challenges more efficiently compared to relying solely on conventional internet-based resources. This experience led us to recognize the value of combining human programming skills with AI assistance, harnessing the strengths of both to achieve optimal results.

As we conclude our exploration, we remain intrigued by the future possibilities of AI in programming. Tools like OpenAI and GitHub Copilot hint at the promising direction of collaboration between humans and AI. However, it is crucial to acknowledge that AI should be seen as a tool to enhance programming skills, rather than a substitute for foundational knowledge and experience.

Through this exploration, we hope to provide valuable insights for those curious about the capabilities of AI in programming, as well as the importance of human expertise in this evolving landscape.

The real conclusion

It's Damo again, no it really is, but I guess you can't entirely trust it given I've just used AI to help me write a blog post and prior to that I used it to help complete a coding challenge. The code challenge is here: https://github.com/DamianStanger/PythonPlayground it's a simple calculator. The chat GPT session on open ai can be found here: https://chat.openai.com/share/fea96f69-70ab-4e11-822c-b79ec8503547 and the session I used to write the above blog post is here: https://chat.openai.com/share/806903cf-c8d1-40f3-842c-362e5af8499d

How long these sessions will live on the internet I don't know, I'm sure they will eventually get deleted, please don't complain when they do, I'm not going to keep a copy of them.

 

This is the first time I've used AI to help either with coding or with writing a blog post, all in all, it's been a positive experience, the coding was quicker as my knowledge of python is not up to much, I have read some chapters from books watched some videos targeted at experienced developers learning python but I've never actually written any, although I have reviewed others code (a strange concept in a way given I've never written it).

Like I said I found the help in writing the code really useful and for sure speeded me up in that coding challenge, i took about 90 minutes to do that with help. I had installed and set up python prior to starting so that’s 90 minutes just to complete toe code. But the writing of the blog post would have been better/quicker if i had just done it myself, i found it hard to get it to sound authentic and to say what I wanted, how I wanted it said, it was a short blog post it would have been quicker to write it myself.

 

I will use AI in the future to help me write code, I am keen to try out copilot but it's worth emphasising again i see this as another aid to learning a new language, not a way to learn programming. Other than code correctness my main concern is that I'm writing idiomatic python, using best practices, and I don't want to learn from the worst/average of the internet, imagine learning to code solely by reading answers in stack overflow or GitHub snippets, no one does that right?

Sunday 25 March 2018

Understanding the node stream API, through examples and practice

We utilise node streams extensively in my day job. Making extensive use of the competing consumers pattern to process messages from AWS queues. We pull messages off SQS, then using the node stream API we process the messages through various stages, until, we write to a mongo DB and/or issue events that land on other SQS queues usually via SNS.

I've been looking into how we gain more performance out of our existing codebase, looking into what can be tweaked, and what needs rewriting, its forced me to become quite intimate with nodes stream API, how buffers really work and what effect the highWaterMark has on them. I've also looked at how we could use event emitters rather than streams to make better use of nodes asynchronous nature.

In this post I aim to impart some of my recent knowledge, for my own future reference but also because, having polled many devs on the subject, there are a number of misconceptions about how streams in node actually work.

Analogies

Analogies are useful. I have always thought of streams as a pipe that data flows through, like a hosepipe with water. As with most analogies it is useful but will only take you so far.

__-0_
_____==========|====================|==========
Tap   Reader       Transformer        Writer


A hose has a tap connected at one end (a reader), some length of pipe in the middle (transformers) and an opening at the other to let the water out of the pipe (a writer). A hosepipe will fill with water when you turn on the tap until it is full with water and then will start to issue water from the open end. If a real tap is switched off the water stops flowing at the open end with water still in the tap. So imagine a hose that is dangling vertically, still attached to the tap but if the water is switched off the water in the pipe will all drain out due to gravity, this is more like a stream in node.

Buffers

__-0_                      _
_____==========|==========|x|==========|==========
Tap   Reader         Transformer        Writer


But the water in the hose is not all being processed at the same time. Imagine 3 parts to the hose all connected together, each one capable of holding a certain amount of water, this is analogous to buffers. In the centre of the middle pipe (the transformer) you have a box, a unit of water flows into the box, then out, but only one unit of water can pass through at once. This is in fact how a transform stream works. Take a compression stream for example, one piece of data goes in, its compressed and the compressed data comes out ready to be written by the writer.

So you now have 4 sections of pipe, each capable of containing some water, these are the buffers. The reader has a buffer on its outgoing side, the transformer has 2 buffers, one read buffer and one write buffer, and the writer has one incoming buffer. If the highWatermark is set to 5 for all streams you have room for 20 units of water (this is not the whole story as I will explain later but like I said it’s a useful analogy).

The streams highWaterMark setting governs the desired amount of data to be in the buffer at any one time. It is not a limit as we will show later but more a guide to tell the stream when to pause the streaming of data into the buffer.

Classes and functions of the stream API


 _                       _                       _
|R|==========|==========|T|==========|==========|W|
    Reader         Transformer         Writer


When you implement streams you need to implement a function in each of the Reader, Transformer and the Writer. The function names are quite self-explanatory _read, _transform and _write respectively. These functions are called when data flow through the pipe, it all happens once the streams are piped together thus:

readStream.pipe(transformStream).pipe(writeStream);

Once the streams are pipped together any data flowing into the read stream will be pulled down the pipe by the stream API which will handle any buffering needed but what is key right now is that only one piece of information can flow through any given function at once, they are 'NOT' asynchronous out of the box. This makes sense as you would not want a transform stream that is compressing or encrypting data to process multiple chunks simultaneously and potentially get the zipped chunks mixed up in the write buffer.

Now because there can be only one piece of data flowing through the functions at once the buffers will fill up if the rate of data coming into the pipe is greater than the rate of data flowing out of the pipe. Usually this is true, for example reading from a file is generally quicker than writing to another.

Data flow


If we look at what happens when a single message is inserted into the pipe you will see it do this
 _                     _                    _
|1|=========|=========| |=========|========| |
 _                     _                    _
| |=========|=========|1|=========|========| |
 _                     _                    _
| |=========|=========| |=========|========|1|


If the write is slow and 1 more message was to be read this is what would happen (for the sake of this example the buffer (highWaterMark) is set to 1 for all streams).

 _                     _                   _
| |====-====|====-====| |====2====|===x===|1|


Notice how both the writers buffer and the transformers write buffer are both full (the writer as a whole can only have one message within it, the buffer was full but it's now processing that message). If a 3rd message was to go down the pipe it would get stuck in the transformers read buffer, it has nowhere to go so will not go into the _transform function until there is room in the next buffer.

 _                     _                   _
| |====-====|====3====| |====2====|===x===|1|


Finally if the reader pumped a 4th message into the pipe it would go into the readers buffer.

 _                     _                   _
| |====4====|====3====| |====2====|===x===|1|


In this state nothing can flow down the pipe until the writer is finished processing message number one. This is called backpressure, messages back up behind the slowest stream (in this case the writer but it could equally be a transformer).

Backpressure, drain and resume


That was a bit of a simplistic example, let's say the buffers are set to a high watermark of 2 rather than one. A slow write would result in the following situation (again note that the write stream can only have 2 messages in it as a whole, with only one in its buffer).

 _                       _                      _
| |===8==7===|===6==5===| |===4==3===|====2====|1|


At this point technically speaking the transform stream and the read stream are paused and will not resume until their buffers are emptied when a drain event occurs. Let's see what happens when the writer finishes writing [1]

 _                       _                      _
| |===8==7===|===6==5===| |===4==3===|====-====|2|


Notice that more data is not pulled from the transform write buffer yet.

Only when [2] finishes processing will a drain event on the write stream occur causing it fill again. There is enough room for 2 pieces of data in the writer, 1 in the buffer and one in the slow write operation. So as soon as [2] finishes being written 3 will be pulled into the write stream

 _                       _                      _
| |===8==7===|===6==5===| |===-==4===|====3====| |


There is now room in the transforms write buffer so 5 will get pulled through the transform

 _                       _                      _
| |===8==7===|===-==6===|5|===-==4===|====3====| |


3 will then go into the writer and 4 will go into the writers buffer
 _                       _                      _
| |===8==7===|===-==6===|5|===-==-===|====4====|3|


When 5 finishes its transform 6 will also go through the transform causing the transforms read buffer to drain
 _                       _                      _
| |===8==7===|===-==-===| |===6==5===|====4====|3|

7 and 8 will then flow into the transforms read buffer
 _                       _                      _
| |===-==-===|===8==7===| |===6==5===|====4====|3|

This causes the readers buffer to drain which will then pull 2 more messages in through the reader
 _                       _                      _
| |==10==9===|===8==7===| |===6==5===|====4====|3|

And we are back to the state we had originally, we are now waiting for the slow write operation to occur on message #3

Buffering

So what you can see from the above example is a pulsing action when messages sort of get pulled through in pairs (the current high watermark) once 2 slow writes have occurred. This is due to the way that the backpressure gets released and the buffers fully empty before fully filling up again. Obviously this gets more complex if you have transformers that are also slow but you will always see messages flowing through the buffers in pairs (when all the streams high watermarks are set to 2).


Some demos

Firstly I'd like to introduce you to the repo that contains the examples. You can see there are a number or index files each one is designed to be run with node indexFileName.js and will give a coloured output to the terminal.

https://github.com/DamianStanger/streams101

I aim to produce and demo some simple code that you can then play with, change the high watermark settings and the numbers of messages with a view to allowing you to gain a better understanding of how streams work yourselves.

Object mode

First a quick note that all the examples given below use streams in object mode. This is because in my case I'm investigating how I can get more performance out of my streams when pulling messages off an SQS queue. So to use object mode makes sense but all the concepts will be similar if you're dealing with chunked streams. The exception here is writing in an asynchronous manner, for example you would not want to write multiple chunks to a file stream asynchronously as your chunks would get all mixed up in the resultant file.

Demo - Fast reading, fast writing

Execute: node 01_indexSync.js
Example results: ./results/01_indexSync.log

The firstly simple demo that shows what happens when the reader and the writer are not IO bound, they are totally synchronous and there are no callbacks or promises involved. The results can be seen in indexSync.log.

Even though the high watermark is set to 2 in each stream there is only ever one message in any of the streams at once. This is because the processing of the message within each stream is CPU bound and so gives the event loop no time to do some other work, including pulling the next message into a buffer.

Notice that it’s the reader that ends the process by sending a null message through the system when it is finished the set number of reads. If you were to comment the push(null) statement out, then the transform and write streams will not know when to finish and so wont execute the _final method.

Demo - Back pressure

Execute: node 02_indexBackPressure.js
Example results: ./results/02_indexBackPressure.log

Here the writer has not been wired into the pipe and so this gives the opportunity to see what occurs when the reader reads until all the buffers are full. The high watermark is set to 5 for each stream so you can see that 15 messages are read, 5 of which are transformed to fill the write buffer of the transform stream. 5 will be stuck in the read buffer of the transform stream and 5 will be in the outgoing buffer of the read stream. At this point all the streams are paused and since there is no writer to trigger a drain then there are no more callbacks to process and the process ends.

A selection of the log is shown below with a little removed to make for a clearer example, see the full log in the repo. Notice how on lines 23 and 33 a zero is returned indicating that the stream is going to pause.

++ 1 Read        <----+
++ 2 Read             |
++++ 1 Transform      |
++ 3 Read             | These 5 messages went straight
++++ 2 Transform      | through the _transform method
++ 4 Read             | and into the write buffer of
++++ 3 Transform      | the transform stream
++ 5 Read             |
++++ 4 Transform      |
++ 6 Read          <----+
++++ 5 Transform <----+ |
++ 7 Read               |
++ 8 Read               | These 5 messages are in the
++ 9 Read               | Transform streams read buffer
++ 10 Read       <------+
++ 11 Read       <----+
++ 12 Read            | These messages are in the
++ 13 Read            | read streams outgoing buffer
++ 14 Read            |
++ 15 Read       <----+


Demo - Slow writing

Execute: node 03_indexSyncSlow.js
Example results: ./results/03_indexSyncSlow.log

Here I have added a slow write operation to the writer, this now operates in the same manner as the walk through I did above titled 'Backpressure drain and resume'. The highwatermark is set to 2 for all streams, and we are processing 10 messages. The results of running this can be seen here (again I've removed some log lines for brevity).

++++++++++ READ Start - Tue Mar 06 2018 18:00:46 GMT+0000 (STD)
++ 1 Read
++ 2 Read
++++ 1 Transform
++++++ 1 Write   <-- 1st message goes into the _write method and starts slow processing
++ 3 Read
++++ 2 Transform <-- 2nd message goes into the write streams buffer
++ 4 Read
++++ 3 Transform <-- 3rd message goes into the transforms stream write buffer
++ 5 Read        <-- 5th message goes into the transform stream read buffer
++++ 4 Transform <-- 4th message goes into the transforms stream write buffer
++ 6 Read        <-- 6th message goes into the transform stream read buffer
++ 7 Read               <-- Stuck in outgoing the read stream buffer
++ 8 Read               <-- Stuck in outgoing the read stream buffer

 At this point all the buffers are full until the writer finishes

------ 1 Write finished
++++++ 2 Write
------ 2 Write finished <-- Write stream is now empty, there is room for 2
                            messages in its buffer. Calling drain will result
                            in 2 messages pushed in from the transform buffer.
++++ 5 Transform        <-- The transform streams write buffer is now empty, so
                            a drain event is called causing 2 messages to be pulled
                            through the _transform method
++++++ 3 Write
++++ 6 Transform <-- Both the messages that were in the transform stream read buffer have
                     been processed, drain causes the 2 messages in the read stream
                     buffer to be pulled into the transform read buffer.
++ 9 Read
++ 10 Read
------ 3 Write finished <-- From here the whole process will start again
++++++ 4 Write
------ 4 Write finished
...
...


Demo - Batch reading

Execute: node 04_indexBatchRead.js
Example results: ./results/04_indexBatchRead.log

I've been implying that the buffers in the streams are finite in nature, that they can only hold a given number of messages, this is not correct and is where the hosepipe analogy starts to breakdown. In the example 03_ a high watermark of 2 was used, when the buffer becomes full any message pushed into the stream will result in a result of false being issued back to the origin that is pushing data into the stream, this back pressure resulted in exactly the 'correct' number of messages in each of the buffers.

When you run this example you will see a long output that I have include below with many parts cut out. The interesting parts are denoted with a number and referenced below:

++ 1 Read                        <---- #001
++ 1.01 Read Batch push:1
++ 1.02 Read Batch push:1
++ 1.03 Read Batch push:1
++ 1.04 Read Batch push:0
++ 1.05 Read Batch push:0
++ 2 Read                        <---- #002
++ 2.01 Read Batch push:0
...
++ 2.05 Read Batch push:0
++++ 1.01 Transform              <---- #003
++++++ 1.01 Write - SLOW         <---- #004
++++ 1.01 Transform - push:1
++++ 1.01 Transform - next:0
...
++++ 2.01 Transform - next:0     <---- #005
++ 3 Read                        <---- #006
++ 3.01 Read Batch push:0
...
++ 3.05 Read Batch push:0
++++ 2.02 Transform              <---- #007
++++ 2.03 Transform
++ 4 Read                        <---- #008
++ 4.01 Read Batch push:0
...
++ 4.05 Read Batch push:0
------ 1.01 - 1:1 Write finished <---- #009
++++++ 1.02 Write
...
------ 1.04 - 1:4 Write finished
++++ 2.04 Transform
++++++ 1.05 Writ++++ 2.05 Transform
++++ 3.01 Transform
++++ 3.02 Transform
------ 1.05 - 1:5 Write finished
...
------ 2.03 - 1:8 Write finished
++++ 3.03 Transform
++++++ 2.04 Write
++++ 3.04 Transform
++++ 3.05 Transform
++++ 4.01 Transform
++ 5 Read
++++++++++ READ End - Wed Mar 07 2018 08:06:17 GMT+0000 (STD)
++++ 4.01 Transform - next:0
------ 2.04 - 1:9 Write finished
...
++++ 4.05 Transform
++++++++++ TRANSFORM Final - Wed Mar 07 2018 08:06:19 GMT+0000 (STD)
------ 3.03 - 1:13 Write finished
...
------ 4.05 - 1:20 Write finished
++++++++++ WRITE Final - Wed Mar 07 2018 08:06:28 GMT+0000 (STD)


#001
When the program starts a read event is fired causing the first batch of 5 messages to be sent into the read buffer, 4 of these immediately go into the transform streams read buffer and the next batch of 5 is read #002. Now 7 messages are in the read buffer.
#003
Next the transform stream processes one message which flows immediately into the slow write operation #004.
#005
Then 5 messages flow through the transform stream filling the write buffer and the partly filling the transform stream outgoing buffer (remember the buffers have a high watermark set to 4. So now we have 4 messages in the write stream (of which one is currently processing) and 1 in the transform stream out buffer. In the meantime all the remaining 4 messages in the read buffer have now moved to the transform streams read buffer. The outgoing buffer in the read stream is now empty.
#006
Another 5 messages are sent into the read streams buffer
#007
Another 2 go through the transform stream. This now makes 4 messages in the write stream, 4 in the out buffer of the transform stream, 4 messages in the input buffer of the transform stream and 3 in the read streams buffer making 15 messages
#008
Given there the writer is taking a long time and all buffers are almost full the only thing left to do is fill the read streams outgoing buffer, so another batch of 5 are pulled in making 8 in the read stream buffer. There are now 20 messages in the system but not one has been written yet. And remember the high watermarks are set to 4, there are 4 buffers so technically room for 16 messages.
#009
Finally the slow write finishes and so messages flow through the system. The write buffer is the first to be emptied one message at a time, only then do messages move between the queues.

I will leave it as an exercise for the reader to walk through the remaining log but do note the position of the 'READ End' log message. It is a long way down the list. Only after all the messages have been drained from the read queue. Message 4.01 is the 16th message so the transform read buffer now has 4 messages in it. The read stream buffer is empty and so it tries to read again but the code sends in a null message indicating there is nothing left to read and so the read stream ends.

Demo - Asynchronous writing

Alright lets finally get to the point, how do we make our synchronous stream processing faster? Well for this exercise we have a slow write so let's look at parallelising that by allowing multiple messages to be processed at any one time.

Execute: node 05_indexAsync.js
Example results: ./results/05_indexAsync.log

In this code base we are going to send 20 messages through the system, the buffers are all set to 2 but the maximum concurrency of the write operations is set to 10. and if you look at the code inside the file writeItAsync.js you will see that for every write that is performed a callback is setup on a given interval. This callback ultimately calls nextCallBack() which makes a decision based on a closure on a nextUsed flag and the maxConcurrency value as to whether or not next should be called yet. Next is only called if there are currently less than 10 callbacks waiting on the event loop.

Once there are 10 on the loop only the completion of the last callback will result in the unblocking of the next loop. This is because for any given processing of a _write you can only call next() once.

I appreciate that this can look a bit complicated in writing, let's look at the log.

++ 1 Read push:1                                            <---- #001
++ 2 Read push:0
++++ 1 Transform
++++++ 1 Write
      1 - 1:0 Write, Calling next()
++ 3 Read push:0
++++ 2 Transform
...
++++ 9 Transform
++++++ 9 Write
      9 - 9:0 Write, Calling next()
++ 11 Read push:0
++++ 10 Transform
++++++ 10 Write
      10 - 10:0 Write, Max concurrency reached             <---- #002
++ 12 Read push:0
++++ 11 Transform
++ 13 Read push:0
++++ 12 Transform
++ 14 Read push:0
++++ 13 Transform
++ 15 Read push:0
++ 16 Read push:0
++ 17 Read push:0                                           <---- #003
------ 3 - 9:1 Write finished in 176                        <---- #004
      3 - 9:1 Write, Next used
...                                                         <---- #005
------ 10 - 5:5 Write finished in 660
      10 - 5:5 Write, Calling next()                       <---- #006
++++++ 11 Write
      11 - 6:5 Write, Calling next()
++++ 14 Transform
++++++ 12 Write
      12 - 7:5 Write, Calling next()
++++ 15 Transform
++ 18 Read push:0
++ 19 Read push:0
++++++ 13 Write - SLOW
      13 - 8:5 Write, Calling next()
++++ 16 Transform
++++++ 14 Write
      14 - 9:5 Write, Calling next()
++++ 17 Transform
++ 20 Read push:0
++++++++++ READ End - Thu Mar 08 2018 08:40:25 GMT+0000 (STD)
++++++ 15 Write
      15 - 10:5 Write, Max concurrency reached            <---- #007
++++ 18 Transform
------ 7 - 9:6 Write finished in 699                       <---- #008
...
      14 - 3:12 Write, Next used
------ 15 - 2:13 Write finished in 808                     <---- #009
      15 - 2:13 Write, Calling next()
++++++ 16 Write
      16 - 3:13 Write, Calling next()
...
++++++++++ TRANSFORM Final - Thu Mar 08 2018 08:40:26 GMT+0000 (STD)
++++++++++ WRITE Final - Thu Mar 08 2018 08:40:26 GMT+0000 (STD)
------ 6 - 6:14 Write finished in 1518
...


#001 - #002
In the first part of the log messages are read from the queue and flow straight through the transformer into the writer. 10 messages make this journey before the max concurrency limit in the writer is hit at which point we have 10 messages in the writer all being processed.

#002 - #003
Message 11 will flow into the write buffers incoming buffer (message 10 is the one currently being 'tracked' by the writer, its message 10 that is occupying the other position in the buffer).
Messages 12 and 13 go through the transformer and sit in the outgoing queue of the transform stream.
Messages  14 and 15 move into the transform stream incoming buffer and finally messages 16 and 17 are in the outgoing buffer of the read stream.

#004
The process now waits until a write completes, in this case it could be any of the 10 messages currently being processed but it it is message 3 that finishes first. Notice that this message being completed does not cause any movement in the buffers, and does not cause a new message to be pulled into the _write method of the write stream.

#005 - #006
In this particular run 4 messages complete before message 10 does leaving 5 in the event loop (check the full logs for details). This is because of the random processing time in the writer async processing. When message 10 finishes the next function is called #006, remember that the processing of each message can only call next once so only on completion of the last message into the queue can next be called which causes more messages to be pulled into the _write function.

#007
Between #006 and #007 5 more messages are pulled through the buffers and sent for processing in the write stream. You can see messages 14, 15, 16 and 17 are pulled through the transform stream. Messages 10, 11, 12, 13, 14, 15 are now processing in the write stream. Messages 18, 19 and 20 are all read from the read stream.

At this point we have the following situation (you will need to look at the full logs to see this in detail):
  • Finished processing: 3, 1, 5, 9, 10
  • Currently Processing in the Write stream: 2, 4, 6, 7, 8, 11, 12, 13, 14, 15
  • Write stream incoming buffer: 16
  • Transform stream outgoing buffer: 17, 18
  • Transform stream incoming buffer: 19, 20
  • Read stream outgoing buffer: <empty>
This can be seen by the 2 logs for 17 and 18 showing that they have gone through the transform stream

#008
We can see its message 7 that completes next but again this does not cause more messages to move, we need to wait for message 15 to complete for next to be called as that is the last into the write stream.

#008 - #009
8 messages then finish processing with message 15 being the last. You can see that when message 15 completes next() is called causing more messages to go into the write stream for processing.

Analysis

This is better than the other situations, we have multiple messages being processed at the same time here, but there is a strange pulsing to the messages going through the writer. Only when the last message into the write completes do more messages start to be processed by the writer. Using this technique we will have 5 or fewer messages currently being processed, but if that last message into the writer takes a long time to complete the overall throughput will slow until that message completes.

This method was the best I could manage whilst still using the class based stream API to pipe messages into the slow writer. But is there another way? Maybe yes there is.

Demo - Event emitters in the writer

The problem with the previous async model was that it only allowed a certain number of messages to be pulled in to the writer from the transformer, and only when the last message completes can more messages be pulled in. This pull system has limitations as we have seen. There is another way, rather than pull messages, let them be pushed as they become available.

This leads us to utilise the built in capabilities of event emitters. We will listen to the 'data' event of the transformer, then kick off an async event every time data is available. But we only want to process so much at once otherwise if the reader is slower than the writer (which it is) resource consumption will go through the roof and eventually overwhelm the limits on the process (ask us how we know). To do this we will pause and resume the transform stream as the limits inside the writer are met.

Execute: node 06_indexEventEmitters.js
Example results: ./results/06_indexEventEmitters.log

++ 1 Read
++ 2 Read
++++ 1 Transform
++++++ 1 Write
           1:0 Write resumed             <---- #001
++ 3 Read
...
++ 11 Read
++++ 10 Transform
++++++ 10 Write
           10:0 Write pausing            <---- #002
++ 12 Read
++++ 11 Transform
++ 13 Read
++++ 12 Transform
++ 14 Read
++ 15 Read
++ 16 Read                                <---- #003
------ 6 - 9:1 Write finished in 150, resumed
++++ 13 Transform
++++++ 11 Write
           10:1 Write pausing
------ 8 - 9:2 Write finished in 209, resumed
++++ 14 Transform
++ 17 Read
++ 18 Read
++++++ 12 Write
           10:2 Write pausing
------ 10 - 9:3 Write finished in 335, resumed
...
++++++ 17 Write - SLOW
           10:7 Write pausing
------ 11 - 9:8 Write finished in 624, resumed
++++ 20 Transform
++++++ 18 Write
           10:8 Write pausing
++++++++++ TRANSFORM Final - Thu Mar 15 2018 17:33:44
------ 1 - 9:9 Write finished in 840, resumed
++++++ 19 Write
           10:9 Write pausing
------ 3 - 9:10 Write finished in 842, resumed
++++++ 20 Write
           10:10 Write pausing
++++++++++ WRITE Final - Thu Mar 15 2018 17:33:44
------ 5 - 9:11 Write finished in 889, resumed  <---- #004
------ 15 - 8:12 Write finished in 420, resumed
...
------ 14 - 1:19 Write finished in 1405, resumed
------ 17 - 0:20 Write finished in 1758, resumed


#001
Every time we see 'write resumed' or 'write paused' there are two numbers preceding it. This is the number currently in progress and the number finished processing. If the number processing is less than 10 (the concurrency) then resume is called on the transform stream making sure that more messages are pushed into the writer. You can see looking through the logs that the number 'in progress' (the first number) climbs to ten then oscillates between nine and ten until there are no more messages in the transform queue (#004) at which point it start to drop to zero when it is finished processing all the messages.

#002 - #003
There are now ten messages being processed by the writer. Another 6 are pulled in to fill the 3 buffers (2 messages in each of the read buffer, incoming and outgoing transform stream buffers)

#003
From here on messages are pushed into the writer as it pause and resumes the transform stream. The number of messages in process in the async write process will remain as nine or ten for the remainder of the run until all messages have been pushed through the system.

Code

If you were to analyse the code and the differences between 05_indexAsync and 06_indexEventEmitters you will see that 06 does not have a class as a write stream, rather it uses event emitters and listens to these events coming from the transform stream. For each message it will kick off an async processing block.

The effect of this model is that the completion of any write message can cause more messages to flow into the write functions, where as in the previous code only the last message into the write stream could do that.

Comparing the relative speed of the different approaches

I've done some comparisons for the different approaches I have gone through above. Each test processes messages with a high watermark set to 10. I've varied the number of messages processed and the maximum write concurrency to give some comparison.



Run #File nameAsyncTime taken# messagesWrite concurrency
3.103_indexSyncSlowN69s1001
4.104_indexBatchReadN68s1001
5.105_indexAsyncN67s1001
5.205_indexAsyncY10s10010
5.305_indexAsyncY95s100010
5.405_indexAsyncY14s1000100
6.106_indexEventEmittersN68s1001
6.206_indexEventEmittersY8s10010
6.306_indexEventEmittersY72s100010
6.406_indexEventEmittersY9s1000100

Clearly the async versions were going to be quicker than the synchronous versions and it's interesting to see that when write concurrency is set to 1 pretty much all of them take the same 68 seconds. I'm attributing the deviation to the randomised wait time within the writers.

The results that interest me the most though are the comparisons between 5.3 and 6.3 asynchronously processing 1000 messages with a concurrency of 10, and then 5.4 and 6.4 processing 1000 messages with a concurrency of 100. Using event emitters (6.3 and 6.4) were 25% and then 35% quicker than using stream pipes (5.3 and 5.4). Again this is not too surprising given the concurrency in the event emitter version is maintained at 100 for the length of the entire run where as in the piped version the number of messages being processed by the writer only gets topped up when the last message in completes.

If you run the same tests yourself pay attention to the logs of 05_indexAsync which show the number of messages currently being processed, and see how it pulses up and down as the last message finishes (especially when the concurrency is high, try 100 or more).

Conclusion

Like I said at the beginning of this article, I didn't really set out to prove anything with this exercise, more just to get a fuller understanding of how the mechanics of streams work in node. I did this with a view to getting better performance out of the node processes that we employ to process messages off queues, perform business logic on that data and load data in to database. During this investigation/journey there have been lots of learnings, these have proved that we were using streams correctly but inefficiently. We operate our server side processing on a model similar to 04_indexBatchRead batching the read from sqs but processing messages in the writer one at a time. I've tried to capture these learnings in code on the github repo found at https://github.com/DamianStanger/streams101

I hope that I managed to impart some of my learnings on to you in the process.

Appendix / references

My code and results - https://github.com/DamianStanger/streams101
Node stream API - https://nodejs.org/dist/latest-v9.x/docs/api/stream.html
Back pressure - https://nodejs.org/en/docs/guides/backpressuring-in-streams/
Lifecycle of a pipe - https://nodejs.org/en/docs/guides/backpressuring-in-streams/#lifecycle-of-pipe
Node event emitters - https://nodejs.org/dist/latest-v9.x/docs/api/events.html

Footnote

I would really appreciate any feedback on this article, positive comments and constructive feedback are equally welcome as I'm still learning, and would hate it if there are conclusions drawn that are inaccurate or just plain wrong. Thanks in advance. Do try out he code examples it will help cement the concepts described in this article.

Wednesday 28 February 2018

Converting CRLF to LF for all files in a git repository

At work we currently have people who do their dev on linux laptops, linux VMs, windows and the WSL which means that we need to be careful about the compatibility of files in git. We deploy to centos in all our production and pre-prod environments so we always check-in linux line endings.

But recently when I was looking through some codez I found a bit of a mix of files with LF and CRLF line endings and so wanted make them all consistent with the LF linux standard.

git config


I don’t know how this happened exactly and didn’t narrow it down to any one commit but just wanted it fixed. We should all have our git clients set to convert and check in linux line endings which you can check with the command:

git config --list --global
git config --list


You are looking for the setting core.autocrlf. This can take three values: true, false and input. Depending on the OS that you are using you need to ensure that you use the correct setting.

On windows it should be set to true. That is check out windows style (CRLF) but check in linux style (LF).

On linux it needs to set to false or input as you don’t want files to contain windows line endings during development so you chekout with LF. You can also leave as default which is false.

I make heavy use of WSL (windows subsystem for linux) as well as centos VMs running on virtualbox. WSL behaves like linux so I have the default set which is not to change the line endings going in or out. But you do have to be careful. If you change the files or create files using a windows editor (I use webstorm and sublime) then you could inadvertently check in windows line endings, so it might be best to use input. Input will checkout as is from the repo but on check in will convert all line endings to LF, just in case a CRLF file was introduced.

By the way I love the WSL I use it every day and do prefer it to using a VM running linux, it works great for node dev.

Converting CRLF to LF


Anyway back to the main point of this post. We have some files with windows line endings mixed in with files with linux line endings. How to make them consistent? In particular how to make them all have linux line endings?

The difference is \r\n (windows) vs \n (linux) the tool sed is very good at finding strings in a file and replacing them with another

sed is a stream editor for filtering and transforming text it takes we can make it take a regex replacement and run that against a file to remove any carriage returns from it '\r'

sed -i 's/\r//g' myfilename.js

-i tells sed to do an in place substitution, 's/\r//g' is a regex that searches for carriage return '\r' and replaces them with nothing '//' globally for that file.

But we have hundreds of files across tens of nested directories. So we need to find all the files we want to 'fix' using the find command.

find . -type f -not -path './.git*' -not -path './node_modules*'

This will recursively list all files from the current directory excluding any files in the .git or node_modules folders. Do remember to exclude your .git folders as you will corrupt it if you run the substitution against files in there. Also remove any package folders or binary folders, this depends on the environment you are working in, I'm currently doing node dev so excluding the node_modules is good enough for me.

All that remains is to put them together using the standard unix pipe operator and the xargs command which allows you to build and execute command lines, it will take the output of the find space separate the file names and append them to the next command, we would use it thus:

find . -type f -not -path './.git*' -not -path './node_modules*' | xargs sed -i 's/\r//g'

If the folder contained 2 files xargs would build a command that looked like this:

sed -i 's/\r//g' ./file1.js ./file2.js

Voila!

All CRLF line endings are replaced with LF. You should be able to check this by using git diff to see the changes. You should see all line endings in the unified diff like this:

diff --git a/file1.js b/file1.js
index 01ce825..f5f8e58 100644
--- a/file1.js
+++ b/file1.js
-old line with windows line endings^M
+old line with windows line endings


If you don’t see the ^M but just two lines that look the same then there are a couple of tricks you can try.
git diff -R This reverses the output, apparently git does not always highlight removed white space, but will highlight added white space.
git diff | cat -v This will pipe the raw patch output from the git diff to cat. cat with a -v echoes the input including all non-display characters (like a carriage return) to the console.

Appendix

https://git-scm.com/docs/git-config
https://git-scm.com/docs/git-diff
https://manpages.debian.org/stretch/sed/sed.1.en.html
https://manpages.debian.org/stretch/findutils/xargs.1.en.html
https://manpages.debian.org/stretch/findutils/find.1.en.html

Saturday 27 January 2018

git shortcuts

I've just recently had to set up my git environment again. One thing came to my attention, that I've not got them all written down anywhere. This post is documentation so next time I have to do this I will have them easily to hand.

$git config --global --list

alias.cm=commit -m
alias.co=checkout
alias.d=diff
alias.ds=diff --staged
alias.l=log --oneline --decorate --graph --all
alias.s=status --short


This can be set up by either adding it to your git config ~/.gitconfig
[alias]
  cm = commit -m
  co = checkout
  d  = diff
  ds = diff --staged
  l  = log --oneline --decorate --graph --all
  s  = status --short


Or you can use the console to set the alias'

git config --global alias.cm commit -m
git config --global alias.co checkout
git config --global alias.d  diff
git config --global alias.ds diff --staged
git config --global alias.l  log --oneline --decorate --graph --all
git config --global alias.s  status --short


Also I use ZSH as my console of choice which has many built in shortcuts for practically all git commands. But to be honest I find them a little cryptic even for me. Be my guest to check it out https://github.com/robbyrussell/oh-my-zsh/wiki/Cheatsheet

Saturday 30 September 2017

Damo's September 2017 Podcast Highlights

I subscribe to many podcasts, you can see the list as it was in 2015 here: Developer podcasts v2. I'm keeping a podcast blog here of the episodes that I find interesting or useful in some way.

Programming and Testing

[HansleMinutes] Maybe just use Vanilla Javascript with Chris Ferdinandi https://hanselminutes.com/598/maybe-just-use-vanilla-javascript-with-chris-ferdinandi
  • There's a new JavaScript created every few seconds. If you pick up any noun there's probably a JavsScript library named after that noun. 
  • What if you just used Vanilla JavaScript? Chris helps Scott answer that question, and more in this episode.

[JavaScript Jabber] Web Apps on Linux with Jeremy Likness and Michael Crump https://dev.to/jsjabber/jsj-bonus-web-apps-on-linux-with-jeremy-likness-and-michael-crump
  • Web application as a service offering from Microsoft. I don't need to worry about the platform.
  • Web Apps has traditionally been on Windows. Web Apps on Linux is in preview.
  • Web Apps on Linux supports Node, PHP, Ruby, and .NET Core.

Agile

[Developer Tea] Growth Mindset https://developertea.simplecast.fm/58bc17f1
  • In this episode, we're talking about having a Growth Mindset.

[Elite Man Magazine] How To Use The 80/20 Rule To Work Less And Achieve Much More In Your Life http://elitemanmagazine.com/80-20-rule-perry-marshall/
  • In today’s episode Perry talks about how apply the 80/20 rule into your life to work less and achieve much more. 
  • In this fantastic interview we cover everything from The Butterfly Effect, to the 80/20 rule in real-life action, to finding out what your super powers are, and how to make yourself infinitely more productive. 
  • If you’re wondering what you can do right now to work less, get more done, and put the 80/20 rule into action, check this episode out now!

[2000 Books] 3 Key mindsets that will make you more productive https://2000books.libsyn.com/99productivityideas-getting-things-done-david-allen-3-key-mindsets-that-will-make-you-more-productive
  • 3 Key mindsets that will make you more productive

Architecture and Devops

[NDC 2017] Confusion In The Land Of The Serverless: - Sam Newman https://www.youtube.com/watch?v=CrS0HVQZiQI
  • Serverless computing is the hot new thing. Like any hyped technology, it promises a lot. However questions remain around concept and implementation, especially when you start to compare how we've built systems in the past, and what serverless offers us now. Is Serverless the future, or just the emperor's new clothes?
  • This talk will very briefly introduce serverless computing, but will then dive into some of the questions that aren't always asked in conjunction with this technology. Topics will include:
  • How does your attitude to security change?
  • Is it easier, or harder, to create reliable, resilient systems?
  • Do patterns like Circuit breakers and connection pools make sense any more?
  • Is vendor lock-in a problem?
  • Is serverless computing only for microservice architectures?
  • Which problems fit serverless computing?

[The New Stack] How Serverless Is Shaping the Future of Software Development https://thenewstack.io/serverless-shaping-future-software-development/
  • Serverless architectures are often positioned as the next big thing in cloud computing, but what exactly is serverless, who is utilizing these tools and services, and how is this ecosystem maturing? 
  • In this episode of The New Stack Makers podcast, we spoke to Mike Roberts, co-founder of Symphonia.io, about all things serverless

[Software Architecture Radio] The New Normal with Mike Nygard http://www.softwarearchitecturerad.io/episodes/2017/9/6/episode-4-the-new-normal-with-mike-nygard
  • Complex Systems
  • Continuous Partial Failure and Looking at Microservices
  • “Agile”: Why?
  • Antifragility
  • Evolutionary Design
  • Evolutionary Architecture
  • Redundancy and DRY (Don’t Repeat Yourself)
  • YAGNI (You Aren’t Gonna Need It)
  • What services should I actually have?
  • Contracts Between Services
  • Advice for Someone Getting Started as an Architect:

[Software Architecture Radio] Mark Richards on the Evolution of Software Architecture http://www.softwarearchitecturerad.io/episodes/2017/1/20/episode-3-mark-richards-on-the-evolution-of-software-architecture
  • After Mark provides us with some interesting aspects of his background (he started his career as an astronomer!), we start by discussing the horizontal and vertical aspects of the evolution of software architecture
  • Some of these drivers are technical - especially often hardware taking some time to catch up with the needs of newer ideas and software - but other times these changes are driven by changes in the business.

[DevOps Days] Lessons Learned From Detroit To Deming https://devopsdays.libsyn.com/podcast/devops-lessons-learned-from-detroit-to-deming-devopsdays-dc-2017
  • This session aims to enlighten DevOps teams, security and development professionals by sharing results from the 2017 State of the Software Supply Chain Report -- a blend of public and proprietary data with expert research and analysis. 
  • The presentation will also reveal findings from the 2017 DevSecOps Community survey where over 2,000 professionals shared their experiences blending DevOps and security practices together. 
  • Throughout the discussion, lessons are discussed that Deming employed decades ago to help us accelerate adoption of the right DevSecOps culture, practices, and measures today.


[O'Reilly Programming Podcast] Sam Newman on moving from monolith systems to microservices https://www.oreilly.com/ideas/sam-newman-on-moving-from-monolith-systems-to-microservices
  • For organizations considering migrating from monolith systems to microservices, Neman suggests moving gradually, by starting with one or two services at the beginning, getting them deployed, and assessing the outcome.
  • Newman identifies independent deployability as one of the key principles for doing microservices well. “If you create a system architecture with independent deployability, so many benefits flow from that,” he says.
  • He recommends a “consumers first” focus for microservices, with designs based on how software will be implemented by customers.
  • How microservices can enable cost-effective scaling
  • In discussing modularity, Newman says “If you want to look at a system that gets modules right, look at Erlang, which was built from the ground up to be a language and a runtime for building distributed systems.”


[Static Void Podcast] Real-World DevOps with Andy Schwam https://www.staticvoidpodcast.com/real-world-devops-with-andy-schwam
  • Discussions about the concepts of "DevOps" in the real world. 
  • What's myth and what works? 
  • What's hard and what's easy? 
  • Andy takes us behind the scenes and tells us what it takes to transform an existing error-prone manual deployment to a highly-reliable, repeatable, and automated process.

/stuff

[99% Invisible] The Age of the Algorithm http://99percentinvisible.prx.org/2017/09/05/274-the-age-of-the-algorithm/
  • Computer algorithms now shape our world in profound and mostly invisible ways:
  • They predict if we’ll be valuable customers and whether we’re likely to repay a loan. 
  • They filter what we see on social media, sort through resumes, and evaluate job performance. 
  • They inform prison sentences and monitor our health. Most of these algorithms have been created with good intentions. 
  • The goal is to replace subjective judgments with objective measurements. But it doesn’t always work out like that.


[Software Engineering Daily] Brave Browser with Jonathan Sampson https://softwareengineeringdaily.com/2017/09/20/brave-browser-with-jonathan-sampson/
  • Online advertising enables free content and services of the Internet. One of the free services that is powered by advertising is the browser. 60% of web browsing is done through Chrome, which is owned by Google, which is powered by advertising.
  • The application that most of us use to explore the web is made by a company that relies on ads, so it is unsurprising that the default of that browser is to allow close tracking of user behavior. When you hit a website, a variety of trackers are logging your data for the purpose of serving you better ads.
  • Brave is a web browser built with a modern view of advertising, privacy, and economics. Brave users can pay for content with their money OR by paying attention to ads. This system is formalized through the Basic Attention Token (BAT), a cryptocurrency that can be used to purchase user attention.


[TED] Tim ferris - Why you should define your fears instead of your goals https://www.ted.com/talks/tim_ferriss_why_you_should_define_your_fears_instead_of_your_goals
  • The hard choices -- what we most fear doing, asking, saying -- are very often exactly what we need to do. 
  • How can we overcome self-paralysis and take action? Tim Ferriss encourages us to fully envision and write down our fears in detail, in a simple but powerful exercise he calls "fear-setting." 
  • Learn more about how this practice can help you thrive in high-stress environments and separate what you can control from what you cannot.


BBC - More or Less] The 10,000 Hours Rule http://www.bbc.co.uk/programmes/p01sqly1
  • If you practised anything for long enough, would you become a pro? Author Malcolm Gladwell popularised the idea that if you devote yourself to anything from chess to playing an instrument for 10,000 hours, you will become an expert.
  • But where did the idea come from, and is it true? More or Less tells the story of how a paper published in 1993 went on to spark a debate – is practice enough, or do you need innate talent as well?
  • David Epstein, author of The Sports Gene and Malcolm Gladwell explain their views.


[TED] How the US government spies on people who protest — including you https://www.ted.com/talks/jennifer_granick_how_the_us_government_spies_on_people_who_protest_including_you
  • What's stopping the American government from recording your phone calls, reading your emails and monitoring your location? Very little, says surveillance and cybersecurity counsel Jennifer Granick. 
  • The government collects all kinds of information about you easily, cheaply and without a warrant -- and if you've ever participated in a protest or attended a gun show, you're likely a person of interest.


Thursday 31 August 2017

Damo's August 2017 Podcast Highlights

I subscribe to many podcasts, you can see the list as it was in 2015 here: Developer podcasts v2. I'm keeping a podcast blog here of the episodes that I find interesting or useful in some way.

Programming and Testing

[Crosscutting Concerns] Jeremy Clark Convincing Your Boss on Unit Testing http://crosscuttingconcerns.com/Podcast-056-Jeremy-Clark-Convincing-Your-Boss-on-Unit-Testing

  • Regression tests - Change Code Without Fear
  • Code coverage (NCover is a tool that reports on code coverage for .NET code)
  • TDD and BDD and ATDD
  • WPF and XML, MVVM


[Crosscutting Concerns] J Wolfgang Goerlich on Encryption Frameworks http://crosscuttingconcerns.com/Podcast-054-J-Wolfgang-Goerlich-on-Encryption-Frameworks

Agile

[Scrum Master Toolbox] Vasco Duarte discusses #NoEstimates http://scrum-master-toolbox.org/2017/08/podcast/2017-first-6-months-top-episodes-5-vasco-duarte-discusses-noestimates/

  • What is #NoEstimates about for the author of the first #NoEstimates book? 
  • What can we learn from Vasco’s journey that led him to find #NoEstimates?


[Mastering Business Analysis] Lightning Cast: Requirements Quality http://masteringbusinessanalysis.com/lightning-cast-requirements-quality/

  • There’s a lot of talk about producing high-quality requirements but what does it really mean? Quality is a standard against which something is measured. It’s a degree of excellence or the ability to satisfy a need or expectation.
  • When it comes requirements, people often talk about the characteristics of good requirements. The most common characteristics mentioned are: complete, concise, correct, clear, testable, traceable, and prioritized.


Architecture and Devops

[Software Engineering Daily] Serverless Continuous Delivery with Robin Weston https://softwareengineeringdaily.com/2017/08/07/serverless-continuous-delivery-with-robin-weston/

  • Serverless computing reduces the cost of using the cloud. Serverless also makes it easy to scale applications. 
  • The downside: building serverless apps requires some mindset shift. 
  • Serverless functions are deployed to transient units of computation that are spun up on demand. This is in contrast to the typical model of application delivery–the deployment of an application to a server or a container that stays running until you shut it down.


[Pipeline conf 2017] Serverless Architectures and Continuous Delivery https://vimeo.com/channels/pipelineconf/209686484

  • Serverless architectures have been touted as the next evolution of cloud-hosted software. Indeed, the promise of resiliency and scalability without the need for infrastructure management sounds too good to be true!
  • But how well do serverless architectures play with the patterns and practises of continuous delivery? Do they help or hinder us in our goal of delivering frequent and low risk software changes to production? What are the trade-offs to weigh up when considering using a serverless architecture on your next project?


[O'Reilly podcast] Mike Roberts on serverless architectures https://www.oreilly.com/ideas/mike-roberts-on-serverless-architectures

  • Why Roberts calls serverless “the next evolution of cloud systems,” as individual process deployment and the resource allocation of servers are increasingly outsourced to vendors
  • How serverless architectures use backend-as-a-service (BaaS) products and functions-as-a-service (FaaS) platforms
  • The similarities and differences between a serverless architecture and microservices, and how microservices ideas can be applied to serverless
  • Roberts explains that serverless is “not an all-or-nothing approach,” and that often “the best architecture for a company is going to be a hybrid architecture between serverless and non-serverless technologies.”
  • Recent advances in serverless tooling, including progress in distributed system monitoring tools, such as Amazon’s X-Ray


[O'Reilly podcast] Neal Ford on evolutionary architecture https://www.oreilly.com/ideas/neal-ford-on-evolutionary-architecture

  • Software architecture’s increasing popularity over the last few years; Ford says that “companies such as Netflix and Amazon showed that if you do software architecture really well, you build a competitive advantage over everybody else.”
  • The non-functional requirements and soft skills needed to successfully implement software architecture.
  • How evolutionary architecture enables you to adapt to the future rather than predict it; Ford notes the pitfalls of “trying to do predictive planning against an incredibly dynamic ecosystem.”
  • Why guided change and incremental change are the two characteristics of an evolutionary architecture.
  • The difference between evolutionary and adaptive systems.


[Infoq] Security Considerations and the State of Microservices with Sam Newman https://www.infoq.com/podcasts/sam-newman-security-microservices-state

  • Wesley Reisz talks with Sam Newman about microservices. 
  • They explore the current state of the art with regards to the architectural style and corresponding tooling and deployment platforms. 
  • They then discuss how microservices increase the surface area of where sensitive information can be read or manipulated, but also have the potential to create systems that are more secure.


[Devops Radio] From Docker to DevOps with John Willis https://devopsradio.libsyn.com/episode-21-from-docker-to-devops-with-john-willis

  • In this episode of DevOps Radio, John Willis, former Director of Ecosystems at Docker, shares everything from his experience in the early days of DevOps to his predictions of what the future holds


/stuff

[TED] The era of blind faith in big data must end https://www.ted.com/talks/cathy_o_neil_the_era_of_blind_faith_in_big_data_must_end

  • Algorithms decide who gets a loan, who gets a job interview, who gets insurance and much more -- but they don't automatically make things fair. 
  • Mathematician and data scientist Cathy O'Neil coined a term for algorithms that are secret, important and harmful: "weapons of math destruction." 
  • Learn more about the hidden agendas behind the formulas.

Monday 31 July 2017

Damo's July 2017 Podcast Highlights

I subscribe to many podcasts, you can see the list as it was in 2015 here: Developer podcasts v2. I'm keeping a podcast blog here of the episodes that I find interesting or useful in some way.

Programming and Testing

[Functional Geekery] Robert C. Martin https://www.functionalgeekery.com/episode-1-robert-c-martin/

  • In this episode I talk with Robert C. Martin, better known as Uncle Bob. about: 
  • Structure and Interpretation of Computer Programs
  • Introducing children to programming
  • TDD and the REPL
  • Compatibility of Functional Programming and Object Oriented Programming


[Cross Cutting Concerns] Jesse Riley sucks at unit testing http://crosscuttingconcerns.com/Podcast-031-Jesse-Riley-sucks-at-unit-testing

  • Jesse Riley and I discuss unit testing and how to do it better


[Channel 9] A Broad View of Machine Learning https://www.msn.com/en-us/movies/trailer/codechat-068-a-broad-view-of-machine-learning-codechat/vp-BBE9hLv

  • Rick Barraza (@rickbarraza) works in the AI space at Microsoft, and is particularly good at communicating the concepts of the seemingly (and actually) complex world of machine learning.
  • In this interview, Rick clarifies the terms machine learning (ML), deep neural networks (DNN), and artificial intelligence (AI), and attempts to cast a vision for this technology in the near and distant future. And an exciting future it is!

Agile

[Mastering Business Analysis] Lightning Cast: The Agile BA Mindset http://masteringbusinessanalysis.com/agile-business-analyst-mindset/

  • How can a Business Analyst be successful in an Agile environment? We can use a lot of the same tools, techniques, and approaches that we would use in a traditional environment. By adopting a different mindset, we apply those tools in a different way.
  • Using lean and Agile approaches, we’re able to iterate by delivering smaller chunks and get feedback along the way so that we can adapt to changing customer needs.


[Deliver It] #NoEstimates https://ryanripley.com/afh-073-noestimates-deliver-cast/

  • What is #NoEstimates
  • How #NoEstimates impacts the work of Product Owners
  • Why data (not guesses) help teams make better decisions and deliver value sooner

/stuff

[Becoming Superhuman] David Heinemeier Hansson: An Interview With A Real-world Superlearner https://becomingasuperhuman.com/david-heinemeier-hansson-interview-real-life-superlearner/

  • Strip out of your brain that there are speed limits to learning… You can compress most learning trajectories into a much, much shorter amount of time…
  • In this episode, I wanted to deconstruct David’s thinking process and figure out how he learns so much so effectively. I wanted to understand how he has managed to be so successful in two entirely different worlds, and see what tips he had to offer to anyone looking to live a life as diverse as his.


[TED] How a handful of tech companies control billions of minds every day https://www.ted.com/talks/tristan_harris_the_manipulative_tricks_tech_companies_use_to_capture_your_attention

  • A handful of people working at a handful of tech companies steer the thoughts of billions of people every day, says design thinker Tristan Harris. 
  • From Facebook notifications to Snapstreaks to YouTube autoplays, they're all competing for one thing: your attention. 
  • Harris shares how these companies prey on our psychology for their own profit and calls for a design renaissance in which our tech instead encourages us to live out the timeline we want.


[Blinkist] GTD - David Allen Says Your Brain Is Not A Hard Drive https://www.blinkist.com/magazine/posts/simplify-productivity-david-allen

  • we talk to productivity guru David Allen about how to keep things simple while still taking advantage of every opportunity.
  • if you ever get the feeling that there simply aren’t enough hours in a day for all you want to get done, or that you’re prioritizing the wrong things, Allen’s method is an essential tool for achieving the goals that are most important to you. 
  • By the end of the episode, you’ll understand exactly what next actions you should take to work more effectively.

[2000 Books] 120[Execution] Power of Habit - Change your negative habits in 3 weeks http://2000books.libsyn.com/120execution-power-of-habit-1-precise-exercise-to-change-your-negative-habits-in-3-weeks-flat

  • One PRECISE Exercise to change your -ve habits in 3 weeks flat!

[Radiolab] Breaking News http://www.radiolab.org/story/breaking-news/

  • Simon Adler takes us down a technological rabbit hole of strangely contorted faces and words made out of thin air. And a wonderland full of computer scientists, journalists, and digital detectives forces us to rethink even the things we see with our very own eyes.
  • Oh, and by the way, we decided to put the dark secrets we learned into action, and unleash this on the internet.

[TEDx] The art of memory: Daniel Kilov https://www.youtube.com/watch?v=VQKt58kuEnk

  • Having struggled with organizational skills as a symptom of his poor memory all through high school, Daniel sought out methods to improve his memory. 
  • It was then that he discovered the "Art of Memory", a loosely associated group of mnemonic principles and techniques used to organize memory impressions, improve recall, and assist in the combination and 'invention' of ideas. 
  • These techniques are sometimes referred to as mnemotechnics, and are employed by memory athletes to perform remarkable feats of learning.

[TED] JOSHUA FOER - Feats of memory anyone can do https://www.ted.com/talks/joshua_foer_feats_of_memory_anyone_can_do

  • There are people who can quickly memorize lists of thousands of numbers, the order of all the cards in a deck (or ten!), and much more. 
  • Science writer Joshua Foer describes the technique -- called the memory palace -- and shows off its most remarkable feature: anyone can learn how to use it, including him

[Elite Man Magazine] How To Win Every Negotiation with Chris Voss http://elitemanmagazine.com/chris-voss-win-every-negotiation/

  • Chris Voss, former FBI Lead International Kidnapping Negotiator, joins our show in this special episode of the Elite Man Podcast! 
  • Chris talks about his experience successfully negotiating some of the most intense and pressure-cooker situations imaginable and the lessons he learned from this. 
  • He shares with us his best tips and strategies for negotiating in any type of environment including buying a car, asking your boss for a raise, or making a business deal. If you’re wondering how to win every negotiation that comes your way, check this episode out now!

[Elite Man Magazine] Persuasion, Influence, And Mind Control with Chase Hughes http://elitemanmagazine.com/persuasion-influence-mind-control-chase-hughes/

  • Chase Hughes, world-renowned behavioral scientist and the #1 body language expert in the country, joins our show in this special episode of the Elite Man Podcast! 
  • Chase talks about persuasion, influence, and even using real-life mind control to get others to do whatever you want! 
  • He shares with us his innovative techniques and tactics for manipulating others to get them to do what you want.

Friday 30 June 2017

Damo's June 2017 Podcast Highlights

I subscribe to many podcasts, you can see the list as it was in 2015 here: Developer podcasts v2. I'm keeping a podcast blog here of the episodes that I find interesting or useful in some way.

Architecture and Devops

[The Changelog] The Serverless Revolution for javascript programmers https://changelog.com/podcast/253

  • Pam Selle at OSCON about the serverless revolution happening for JavaScript developers https://conferences.oreilly.com/oscon/oscon-tx/public/schedule/detail/56876
  • https://www.youtube.com/watch?v=vYkdj1fXOHI
  • Gain a general understanding of serverless and possible architectures
  • Serverless computing—using platforms like AWS Lambda, Google Cloud Functions, or Azure Functions—takes your microservices architecture and brings it into a new age of operations. Why maintain a server when you can run your code on-demand? Combine this power with Node.js and JavaScript-powered applications, and you have an amazing combination in your hands.
  • Pam Selle offers an overview of serverless computing, including why it’s so revolutionary and where to get started, and explains how you can use it to power your apps at a fraction of the usual cost of compute using a JavaScript-dominant architecture.


[Run As Radio] DevOps in 2017 https://www.runasradio.com/Shows/Show/537

  • How is DevOps evolving? While at the DevOps Enterprise Summit in London, Richard sat down with Nicole Forsgren to talk about her latest data finding and analysis on DevOps. 
  • The conversation starts with a discussion about making good reports, including who the data is collected from. Ideally you'd want a fully random data set, but as Nicole explains, that's not possible - you have to go with as large a set as possible. In the case of the 2017 report, that's 3200 survey responses
  • https://puppet.com/resources/whitepaper/2017-state-of-devops-report


[arrested devops] When The Levee Breaks With Jeff Smith And Mark Imbriaco https://www.arresteddevops.com/disaster-communication/

  • Who owns your availability?

Programming and Testing

[Developing Up] The Art of the Code Review http://www.developingup.com/19

  • Great developers continually seek to improve the code they work on and write. In this episode we discuss how you can use code reviews to help yourself and your team become better developers.
  • Types of code reviews
    • Formal “Code Review”
    • Part of the QA process
    • Automated reviews
    • Pair programming
    • PR reviews
  • Benefits of code reviews
    • Project benefits
    • Team benefits
    • Reviewee benefits
    • Reviewer benefits
  • Guidelines
    • No bad attitudes
    • Set goals
    • Less is more
    • Annotate
    • Document


[Programming Throwdown] Code Reviews http://www.programmingthrowdown.com/2017/05/episode-66.html

  • Why?
    • Get another pair of eyes
    • Teach others about what you do
  • What Not to do
    • Become a road block to work
    • Let reviews linger
    • Let it become about only style
    • Have only some people do reviews
  • How?
    • Email
    • In-person
    • Web tools
    • Phabricator
    • Gerrit
    • Gitlab/github
  • Rules
    • All changes must be approved by someone
    • Readability
    • +1 vs +2 or similar
    • To push anyways, there's an emergency mode
    • Keep line count down


[Hansel Minutes] Inside WebAssembly with Mozilla Fellow David Bryant https://www.hanselminutes.com/581/inside-webassembly-with-mozilla

  • the last few decades of the web and how it's all about to change with the advent of WebAssembly.
  • Is JavaScript the new "metal?"


[CodeChat] The Latest, Greatest Features of C# 7.0 https://channel9.msdn.com/Shows/codechat/067

  • C# version 7.0 is totally a thing and with it come a number of cool features like tuples and deconstruction. According to Mark, none of the enhancements are earth shattering or code breaking, but they will eventually change the way you author your project.


[JavaScript Jabber] NPM 5.0 with Rebecca Turner https://devchat.tv/js-jabber/jsj-266-npm-5-0-rebecca-turner

  • Rebecca Turner, tech lead for NPM, a popular Javascript package manager with the worlds largest software registry.
  • Learn about the newly released NPM 5 including a few of the updated features.


[Complete Developer Podcast] Laws of Programming http://completedeveloperpodcast.com/episode-96/

  • In any field, there is a lot of hard-won knowledge that the more experienced attempt to impart to those with less experience. Depending on the field, these things may be expressed as old sayings, or laws. They typically aren’t really hard and fast rules, but rather are general tendencies that have been observed over time. Programming, like any other field, has those and many of them are well worth learning.
  • Amongst others:
  • Pareto Principle - For many phenomena, 80% of consequences come from 20% of the causes.
  • Brook’s Law - Adding manpower to a software project will make it later.
  • Conway’s Law - Any piece of software reflects the organizational structure that produced it.
  • Moore’s Law - The power of computers doubles every 24 months OR The number of transistors on an integrated circuit board will double in about 18 months.
  • Knuth’s Optimization Principle - Premature optimization is the root of all evil.
  • Hofstadter’s Law - It always takes longer than you expect, even when you take into account Hofstadter’s law
  • Law of Demeter - This is also known as the law of least coupling.
  • Hanlon’s Razor - Never ascribe to malice that which can adequately be explained by stupidity.
  • Dunning-Kruger Effect - Unskilled persons tend to mistakenly assess their own abilities as being much more competent than they actually are.
  • Postel’s Law - Be conservative in what you do, be liberal in what you accept from others.


[Cucumber Podcast] Fast Tests https://cucumber.io/blog/2017/06/29/fast-tests

  • Everyone knows fast tests are valuable, so why do so many companies abide slow ones?
  • What is 'fast'?
  • how to make them so?


[Cucumber Podcast] BDD in Banking https://cucumber.io/blog/2017/05/25/bdd-in-banking


Agile

[scrum master toolbox] Vasco Duarte on what #NoEstimates means for Agile http://scrum-master-toolbox.org/2017/06/podcast/vasco-duarte-on-what-noestimates-means-for-agile/

  • What is #NoEstimates about for the author of the first #NoEstimates book?
  • What can we learn from Vasco’s journey that led him to find #NoEstimates


[Agile in 3 minutes] Do https://agilein3minut.es/36/

  • Special guest Lanette Creamer asks: When the word “Agile” implies action, why is there still so much talk?


[Agile in 3 minutes] Pace https://agilein3minut.es/14/

  • Amitai asks: How much do you demand of yourself and others?
  • SustainablePace
  • FortyHourWeek
  • OverTime


[Agile in 3 minutes] Influence https://agilein3minut.es/15/

  • Amitai asks: How often do people do as you suggest?
  • Power (social and political)
  • Social influence
  • Power Versus Authority


[developing up] Task Estimation Techniques http://www.developingup.com/18

  • Estimating is hard. In fact, estimating is sometimes considered one of the hardest aspects of development. 
  • While for reasons beyond your control you can never guarantee the accuracy of our estimates, you can control how well you deliver and defend the estimates you provide.


[cross cutting concerns] Arthur Doler on Retrospectives http://crosscuttingconcerns.com/Podcast-042-Arthur-Doler-on-Retrospectives

  • Retrospectives and how to make them better.

/stuff

[Eat Sleep Code] How Your Brain Works Against You http://developer.telerik.com/content-types/podcast/how-your-brain-works-against-you/

  • How do our brains interpret cause and effect
  • The ways in which your brain wants to think of things as narratives
  • All the tricks it does to save itself from having to think
  • Arthur shares his perspective on cognitive bias and how it effects the software development process.


[Noah Kagan Presents] Arnold Schwarzenegger's Total Recall - Book Report http://okdork.com/arnold-schwarzenegger-total-recall-book-report/

  • Have a crystal clear vision and intention.
  • Remove all distractions.
  • Write down your goals.
  • Surround yourself with the best.
  • Reps Reps Reps Discipline.
  • Don't limit yourself.
  • You have to sell.
  • Attitude.
  • Make opportunities happen.
  • Stay hungry.
  • Listen to feedback.
  • Be naïve and follow your curiosity.


[Decrypted] This Man’s Murder Might Get Solved by Amazon’s Alexa https://www.bloomberg.com/news/audio/2017-06-12/this-man-s-murder-might-get-solved-by-amazon-s-alexa

  • As we surround ourselves with more and more of these internet-connected devices, Nico and Aki will discuss how our data should be used and why consumers should care. 
  • Its as scary as you think it might be.


Wednesday 31 May 2017

Damo's May 2017 Podcast Highlights

I subscribe to many podcasts, you can see the list as it was in 2015 here: Developer podcasts v2. I'm keeping a podcast blog here of the episodes that I find interesting or useful in some way.

Architecture and Devops

[GOTO 2017] The Many Meanings of Event-Driven Architecture https://martinfowler.com/videos.html#many-meanings-event
  • https://www.youtube.com/watch?v=STKCRSUsyP0
  • Event notification: components communicating via events
  • Event-based State Transfer: allowing components to access data without calling the source.
  • Event Sourcing: using an event log as the primary record for a system
  • CQRS: having a separate component for updating a store from any readers of the store

[The Cloudcast] The ServerlessCast - Event-Driven Design Thinking http://www.thecloudcast.net/2017/05/the-serverlesscast-6-events-vs-functions.html
  • How to run a company entirely on serverless, the on-going benefits of not maintaining servers, new application patterns with events, and the potentials of serverless in the future.
  • 3-tier applications and architectures vs. event-driven architectures?
  • What is a "bulky function"? How to evolve from procedural thinking to event or asynchronous thinking?
  • How to optimize the many functions that make up an application

[InfoQ] Daniel Bryant on Microservices and Domain Driven Design https://www.infoq.com/podcasts/daniel-bryant
  • Moving from monoliths to micro-services, covering bounded contexts, when to break up micro-services, event storming, practices like observability and tracing, and more.
  • Migrating a monolith to micro-services is best done by breaking off a valuable but not critical part first.
  • Designing a greenfield application as micro-services requires a strong understanding of the domain.
  • When a request enters the system, it needs to be tagged with a correlation id that flows down to all fan-out service requests.
  • Observability and metrics are essential parts to include when moving micro-services to production.
  • A service mesh allows you to scale services and permit binary transports without losing observability.

[DockerCon 2016] Making friendly micro services https://www.youtube.com/watch?v=zRg7pIS3TjM
  • Small is the new big, and for good reason. The benefits of microservices and service-oriented architecture have been extolled for a number of years, yet many forge ahead without thinking of the impact the users of the services. 
  • Consuming micro services can be enjoyable as long as the developer experience has been crafted as finely as the service itself. But just like with any other product, there isn’t a single kind of consumer. 
  • Together we will walk through some typical kinds of consumers, what their needs are, and how we can create a great developer experience using brains and tools like Docker.
  • Helpful docs are always up to date
  • Revision history - what changed and why
  • Trouble shooting built in, monitoring
  • Easy to deploy and scale
  • Easy to consume, how to consume and when
  • Must co-exist in a larger eco system, don’t be the biggest tree in the forest. 

Programming and Testing

[funfunfunction - youtube] Functional programming in JavaScript https://www.youtube.com/playlist?list=PL0zVEGEvSaeEd9hlmCXrk5yUyqUag-n84
  • Some really good fun overviews of functional programming concepts in javascript

[InfoQ] Lisa Crispin and Justin Searls on Testing and Innovation in Front End Technology https://www.infoq.com/podcasts/crispin-searls
  • Pair testers to write production code with the programmers.
  • Developers have to be focused on right now, testers have freedom to look at more of the big picture
  • People know testing is good and there a lot of tools for it, but some tools are ill-conceived.
  • We need a better language for talking about good QA and full stack testing.

[Simple programmer] The Future of Software Development (With Erik Dietrich) http://simpleprogrammer.libsyn.com/449-the-future-of-software-development-with-erik-dietrich-simple-programmer-podcast
  • What could the future of software development look like and how should you prepare for it

Agile

[Youtube] Agile Product Ownership in a Nutshell https://www.youtube.com/watch?v=502ILHjX9EE

[Deliver it] Back to Basics http://deliveritcast.com/ep50-back-to-basics
  • Episode 50 is an overview of the last 50 episodes revisiting many of them and giving a taste of what past episodes you might like to catch up on
  • A common theme for Product Owners is how do I get better? The best answer is to get better at the basics.  Inspired by another show that looks at its core purpose every so often, this episode looks at what the basics are since our initial podcast, what might have changed, and also serves as a reflection point for what we’ve covered in our first 50 shows. If you’re a PO who’s new to the show, this is a great place to start.

[Deliver it] DevOps for Product Owners http://deliveritcast.com/ep49-devops-for-product-owners
  • In order for Product Owners to help teams deliver value to our customers, the product has to actually be delivered.  The teams must perform some level of effort to get the code deployed and out the door. For that to be easy for everyone, DevOps is a set of skills and practices that allow that process to be automated, error-free, and turn delivery into a routine event.  
  • In this episode, Lee Eason joins to discuss what a PO needs to know about DevOps, why you should insist on it, and what you can do to help the teams achieve it.  

[Agile uprising] Agile Architecture with Martin Fowler and Rebecca Wirfs-Brock http://agileuprising.libsyn.com/agile-architecture-with-martin-fowler-and-rebecca-wirfs-brock
  • https://www.martinfowler.com/ieeeSoftware/whoNeedsArchitect.pdf
  • Martin and Rebecca provide a very clear definition of what architecture is to start the conversation which then leads into an honest conversation about how architecture is defined in the product’s unique context. They also provide great insight into the dynamics of what can be and cannot be considered architecture, and how the definition is fluid based on the engineering context.
  • We discuss the impact of unit tests on architecture, and to what degree tests and emergence define architecture, vs. up front design.
  • We also discuss the importance of domain models, and who should be involved in the definition of the domain model – specifically the requirement that the business folks be in the conversation.
  • As the interview draws to a close, we discuss the importance of documentation in agile architecture. The discussion covers the “the code is the documentation” stance to more comprehensive documentation stances.

/Stuff

[Tim Ferriss] Accelerated Learning and Mentors – My Personal Story http://tim.blog/2017/05/17/meta-learning/
  • An episode on education and accelerated learning amongst other things
  • Want to learn something fast? Listen to this reverse interview.

[The school of greatness] Nelson Dellis is training your brain to do the impossible https://lewishowes.com/podcast/e-nelson-dellis/
  • Not only does Nelson compete in memory championships around the world, he also has created great courses, videos, and soon a book to teach others his techniques.
  • it’s important to train our brains as much as we train our bodies.

[TED] The future we are building — and boring https://www.ted.com/talks/elon_musk_the_future_we_re_building_and_boring
  • Elon Musk discusses his new project digging tunnels under LA, the latest from Tesla and SpaceX and his motivation for building a future on Mars in conversation with TED's Head Curator, Chris Anderson.

[Decrypted] Fake News in the French Elections http://pca.st/iErg
  • What is fake news and why does it matter?

[Code and cast] publishing content on Pluralsight and some javascript http://codeand.us/podcast/episode-16-pluralsight-and-javascript/
  • On Pluralsight and presenting
  • Javascript and Architecture questions and answers

Other interesting blog posts

“I have nothing to hide. Why should I care about my privacy?” https://medium.com/@FabioAEsteves/i-have-nothing-to-hide-why-should-i-care-about-my-privacy-f488281b8f1d

Privacy Protects Bothersome People https://martinfowler.com/articles/bothersome-privacy.html

Facebook users unwittingly revealing intimate secrets http://www.theguardian.com/technology/2013/mar/11/facebook-users-reveal-intimate-secrets
Facebook users are unwittingly revealing intimate secrets – including their sexual orientation, drug use and political beliefs – using only public "like" updates, according to a study of online privacy

Get your loved ones off Facebook http://www.salimvirani.com//facebook/

Why I Can’t/Won’t Point to Facebook Blog Posts http://scripting.com/2017/05/31.html#a110526
A reply to not posting on facebook https://daringfireball.net/2017/06/fuck_facebook