14:03:41 #startmeeting Cockpit weekly meeting 2016-12-05 14:03:41 Meeting started Mon Dec 5 14:03:41 2016 UTC. The chair is andreasn. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:41 Useful Commands: #action #agreed #halp #info #idea #link #topic. 14:03:41 The meeting name has been set to 'cockpit_weekly_meeting_2016-12-05' 14:03:44 .hello andreasn 14:03:45 andreasn: andreasn 'Andreas Nilsson' 14:04:10 .hello dperpeet 14:04:11 dperpeet: dperpeet 'None' 14:05:04 #topic Agenda 14:05:43 * Network checkpoints status 14:06:10 * NFS Server 14:07:13 maybe that's it. Ok, lets run with that 14:07:21 #topic Network checkpoints status 14:07:33 okay 14:07:45 so we had checkpoints for some time now 14:07:58 and people are running into the "edge cases" 14:08:08 what kind of edge cases? 14:08:09 which are of course some people's main case 14:08:31 such as making changes that take longer than we allow 14:08:45 when does that happen? 14:08:46 cockpit gives up after 15 seconds 14:08:59 the one real case I have seen is where DHCP takes 40 seconds 14:09:06 ah, I see 14:09:07 in retrospect, that does seem like a tight window for some cases 14:09:30 yes, we should look at tuning the timeouts 14:09:48 I would like to discuss a usability aspect of this 14:09:53 what I have been looking at first is to have a path through the UI that works for any change, no matter how slow 14:09:55 so at 15 seconds it times out and rolls back, just because the DHCP is slow? 14:09:57 but let me know when you have talked about what you wanted to say first 14:10:16 andreasn, correct 14:10:25 it's a tradeoff 14:10:51 but since the UI is pretty clear about is going on during a checkpoint 14:11:01 ("Testing connection") 14:11:18 we can make that time longer without confusing people too much 14:11:39 is one minute a resonable time? 14:11:41 and if people give up at that point and reload or just go away 14:11:46 that's harmless 14:11:58 since the rollback happens in any case 14:12:13 andreasn, I have no idea 14:12:26 well, that leads into my thought: can we trigger a rollback early? 14:12:26 people are talking about DHCP taking several minutes 14:12:47 i.e. more like an "undo" 14:13:01 dperpeet, no, we are disconnected at that point 14:13:19 how about we allow overriding the rollback 14:13:25 but only make that work on purpose 14:13:27 let me describe one more thing 14:13:30 ok 14:13:31 :) 14:13:52 but is the option between showing "testing connection" for everyone for 5 minutes, and having it fail for those with slow DHCP servers? 14:14:06 so, one idea is to let the rollback happen, and then the user gets an opportunity to make the change anyway 14:14:18 that was in the original design 14:14:39 andreasn, yes 14:14:41 mvollmer, that sounds good 14:14:48 combined with a reasonable timeout on the first attempt 14:14:54 I don't think anything >30 seconds makes sense there 14:15:16 I seriously consider that my connection has died after 10-15 seconds 14:15:24 I don't think I would wait for a minute 14:15:41 except if you know that your DHCP is broken and is slow 14:15:49 right 14:15:56 but then I could have it fail 14:16:01 double check my dhcp 14:16:10 and tell cockpit to go without a rollback 14:16:13 or specify a custom timeout 14:16:16 on that second try 14:16:26 the "try anyway" could have an input for a timeout 14:16:26 uhh 14:17:12 dperpeet, good idea, but that's appraoching a spaceship cockpit, no? 14:17:21 hm 14:17:31 one more thing: 14:17:58 if a rollback is really slow, cockpit used to timeout, and you wouldn't get a second try 14:18:00 setting the timeout time feels very fiddly, because I assume there is no good way to measure the speed of the DHCP-server 14:18:17 I believe that valid workflows should have precedence 14:18:26 so if you do the right thing, even if it's slow, it should work 14:18:47 on the other hand, you could say that preserving the connection is also pretty important 14:18:57 we could just increase the rollback timeout with every try 14:19:00 because otherwise you might not be able to "get back in" 14:19:04 so first try, 15 seconds 14:19:13 if the user presses "Do it anyway" 14:19:14 but that runs into cockpit disconnect, right? 14:19:20 we use a timeout of 90 seconds 14:19:39 could you have the system just increase the time between several tests by itself? 14:19:55 andreasn, I don't think that's a good idea 14:20:01 it starts by testing for 15 seconds. Realizes things doesn't work, starts a 30 sec test etc. 14:20:05 usually it's probably actually a wrong setting 14:20:10 you'd just be disconnected for ages 14:20:13 right 14:20:15 andreasn, it's not the time between testing the connection, but before rolling back the change 14:20:22 ah, I see 14:20:27 sorry for the confusion 14:20:40 mvollmer, I like increasing on the second try 14:20:47 and just give up if 90 seconds don't work 14:20:58 or keep increasing, but display that time 14:21:01 so the user knows what to expect 14:21:13 we could have three tries: one with 15 seconds 14:21:29 fails -> "This didn't work, would you like to try again and wait a bit longer?" 14:21:48 also fails -> "This didn't work, would you like to do it without any timeout" 14:22:23 "Test again" 14:22:49 so that would make it a 3rd button? 14:24:27 no, a second dialog 14:24:35 with a slightly different wording 14:24:57 the one we have says "This will disconnect you" 14:25:15 the new one would say "This looks like it might disconnect you" 14:25:28 not sure if this is worth it 14:25:37 it's worth a shot 14:25:55 the difference should be obvious 14:25:59 but how does the system know what dialog to trigger? 14:26:01 let's recap why we don't just increase the timeout to 5 minutes 14:26:27 andreasn, they would always come in order, first the weak one, then the hard one 14:26:41 but how do you trigger the hard one? 14:26:41 if something is obviously broken, we don't want to wait for a long time to have the system roll back 14:26:51 we want people to know that there is a good chance that the connection comes back 14:27:03 mvollmer, my comment was for the recap 14:27:22 dperpeet, can you repeat? 14:27:32 if something is obviously broken, we don't want to wait for a long time to have the system roll back 14:27:46 and why not? :-) 14:27:51 therefore we shouldn't have a very long timeout to begin with 14:28:06 because people would panic and start driving to the datacenter 14:28:13 why should I have to wait minutes if I hit a wrong button 14:28:25 as punishment? 14:28:38 ... 14:28:44 I'll let andreas answer that one 14:28:49 why we don't want to punish users 14:28:53 this should be rare, and if you really switch off the wrong interface, it doesn't matter much to wait a few minutes 14:29:00 I think it does 14:29:06 if you know what is going on 14:29:10 I expect a web ui to be responsive 14:29:15 so five minutes is a long time to wait just because you hit the wrong button 14:29:28 in every single case 14:29:38 what about 90 seconds? 14:30:04 the whole interaction would feel long, unresponsive, it would give the feeling that the server is annoying and is working against you 14:30:09 * mvollmer is devils advocate 14:30:23 I don't think anything >30 seconds is good for the first try 14:30:39 so we would lose the "this is awesome" feeling 14:30:45 yes 14:30:56 okay, I am happy to hear that 14:31:05 i agree, of course 14:31:23 is there an issue open about this? 14:31:47 yes and no 14:32:03 people have trouble with the 15 second timeout 14:32:18 and they are asking for a 90 second timeout 14:32:39 that's a minute and a half, right? 14:32:43 hm 14:32:44 mvollmer, mod_proxy proxy workers have a default disconnect timeout of 30 seconds, fyi 14:33:19 mvollmer, and the apache server I think defaults to 60 seconds 14:33:19 I am offering this: https://github.com/cockpit-project/cockpit/pull/5472 14:33:35 NetworkManager times out DHCP after 45 seconds 14:34:20 so, this is confusing (or I make it confusing) 14:34:28 thanks for the feedback on the timout tuning 14:34:44 let's take the rest off-line if there are more question 14:34:46 okay? 14:34:48 agreed 14:34:51 sounds good 14:34:59 tricky questions, but good to discuss 14:35:00 one more thing 14:35:30 checkpoints have some bugs that make it look as if Cockpit simply misconfigures everything 14:35:40 so the rollback is not perfect 14:36:05 so I'll switch checkpoints off for complicated things like creating bonds 14:36:16 yes, I think that's good for the time being 14:36:45 although we need to consider how we make users aware which action is rollback protected and which ones aren't 14:36:59 or at least make sure we don't make it sound like everything will be rolled back 14:36:59 do we? 14:37:39 there is no indication of checkpoints/rollback in the UI until they actually hit 14:38:26 or do you think that people will get used to having their asses saved that they get careless? 14:38:27 it's ok right noiw 14:38:29 now 14:38:39 we just have to watch release note wording 14:38:48 alright, yes. 14:39:00 what we wrote so far works 14:39:07 I'm just saying to keep it in mind 14:39:09 it's pretty cool 14:39:15 yup 14:40:22 topic timeout? 14:40:37 hahaha 14:40:39 yes 14:40:51 #topic NFS Server 14:41:25 [cockpit] mvollmer opened pull request #5554: test: Fix race in check-storage-luks (master...storaged-fix-luks-password-race) https://git.io/v18Xd 14:41:39 so based on the work made by dperpeet, sgallagh and others in the Fedora Server Group, I started working on the Cockpit part of it 14:41:41 https://docs.google.com/document/d/1jLyKsECdHdlKltmHGgf_-iOKj-hj4Qjbh5Zgm7a-eMc/edit 14:41:56 https://github.com/cockpit-project/cockpit/wiki/Feature:-NFS-Server 14:42:08 mostly collecting prior art right now 14:43:09 if anyone know of any good NFS UIs, feel free to add them to that page 14:43:24 going to distill the requirements into stories next 14:43:33 and I think that was it on that 14:43:53 haha, "good nfs uis" 14:44:13 :) 14:44:18 file sharing UIs is a better description maybe 14:44:31 there are some NAS ones that are pretty all right 14:44:40 right 14:45:08 just making a little stab towards the complexity of nfs :) 14:45:12 heh 14:45:26 I hope we can make it work without using the wizard pattern 14:45:35 haha 14:45:36 me too 14:45:44 oh yes, nfs specs, ugh :) 14:46:46 andreasn, do you think we can make this work in iterations? 14:46:56 probably 14:47:00 or do you want to have a pretty good overall picture early on 14:47:15 overall is good to have from a design perspective 14:47:26 but implementation can happen in steps 14:47:29 yeah, I agree - we shouldn't miss any big stuff 14:47:51 you can probably ping the server list pretty early with a first iteration 14:47:58 so we can run it by everyone 14:48:04 and see if we missed anything obvious 14:48:11 sounds good 14:48:51 all right, I think that's it for that 14:48:55 #topic Open floor 14:49:10 tomorrow is holiday in Finland 14:49:21 99th birthday of the country 14:50:34 nice 14:50:40 happy birthday, Finland 14:51:36 all right, I guess that's all 14:51:49 thanks everyone! 14:51:53 thanks! 14:52:52 #endmeeting